Wednesday, July 14, 2021

My take on "How to re:Invent"

It's back. AWS is taking over Las Vegas for a week filled with information, sales pitches, and corporate-friendly activities. And while COVID and the possibility of a “fourth wave” hang over the conference, I decided to sign up. Having been to re:Invent once, I now consider myself an expert on how to survive the week. Here are five of my suggestions.

#1: Wear comfortable shoes.
OK, everybody says this, but it bears repeating: you're going to do a lot of walking. It might take ten minutes to navigate from the front door of a hotel to the meeting rooms, following a labyrinthine path through the casino. To give you some numbers: my watch recorded 87,000 steps, or 44.6 miles, over the five days of the conference. That may be higher than average: I often walked between venues rather than find my way to the shuttle buses. But even if you “only” walk 30 miles, you'll still be thankful for doing it in a pair of running shoes.
#2: Stay in a “venue” hotel.
These are the hotels that host sessions and other sponsored content, as opposed to the “sleeping” hotels that just have rooms for attendees. There are several reasons to stay at a venue hotel, but in my opinion the most important is that it cuts down on the amount of walking that you have to do. Of my 87,000 steps, I estimate that 10,000 or more were taken up in walking from my room at the Park MGM to the Aria so that I could pick up a shuttle bus.
#3: Attend workshops, not sessions.
There are some great sessions at re:Invent, conducted by people who are intimately familiar with the service. If you have specific questions it's worth attending one of the “deep dives” and then walking up to the speaker afterward to ask those questions.

But, all of these sessions will be recorded, and you can watch them at your leisure. So if you don't have specific questions there's no reason to attend in-person. What you can't do after December 3rd is learn with an AWS instructor by your side (well, not for free anyway). Unfortunately, space for these workshops is limited, so sign up early for the ones you want (that said, scheduling at re:Invent is extremely fluid; at least half the sessions I attended were marked as full but then had spots open up an hour before they started).

#4: Fly out on Saturday.
If you're not from the United States, you may not realize that re:Invent takes place during the week after the Thanksgiving holiday. Thanksgiving is a time when people return home to visit family and friends, and then all of them get on a plane the following Sunday to return home. It's historically the busiest travel day of the year, and US airports are crowded and frantic with people who only fly on that weekend. Plus, airlines charge the highest rates of the year, because they know people will pay. Even if you have TSA/Pre, it's not fun.

If you're willing to fly out a day early, you avoid the crowds. Plus, you can save significantly on the airfare (right now, it would save me over $300, or nearly 50%). Against that, you'll be paying for an extra night in Vegas. For me, with the conference rate for the hotel, the numbers worked.

#5: Get out of Vegas.
For some people, Vegas is a destination: they love the lights, the noise, and the constant activity. For me, it's overwhelming. Fortunately, you can find thousands of square miles of absolute desolation just outside city limits.

Last time, I rented a motorcycle for a day and explored nearby attractions: Valley of Fire, Hoover Dam, and Red Rock Canyon. This year, I'm planning to take three days and explore southern Utah and northern Arizona. If you're not a motorcyclist, Vegas also has plenty of rental cars, including exotics. And at the far end of the scale, you can spend a day in a high-performance driving class at the Las Vegas Motor Speedway.

Well, that's it. Now it's time to cross my fingers and hope the US COVID situation remains under control.

Tuesday, June 8, 2021

An open letter to the AWS Training organization

You don't have a Feedback link on your site, but it seems that Amazon keeps close tabs on the Blogosphere, so hopefully this reaches you.

I don't know whether you're an actual sub-division of Amazon, but the website URL https://www.aws.training certainly didn't give me a warm fuzzy feeling when it came up in Google. In fact, my first thought was that it was some unaffiliated company that had better SEO.

So, since it was asking me for login credentials, I did what any reasonably cautious technologist would do, and ran whois. And this is what I got back:

Domain Name: aws.training
Registry Domain ID: 8d519b3def254d2f980a08f62416a5b9-DONUTS
Registrar WHOIS Server: whois.comlaude.com
Registrar URL: http://www.comlaude.com
Updated Date: 2019-05-19T19:54:24Z
Creation Date: 2014-03-19T00:32:11Z
Registry Expiry Date: 2024-03-19T00:32:11Z
Registrar: Nom-iq Ltd. dba COM LAUDE
Registrar IANA ID: 470
Registrar Abuse Contact Email: abuse@comlaude.com
Registrar Abuse Contact Phone: +44.2074218250
Registrant Name: REDACTED FOR PRIVACY
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: REDACTED FOR PRIVACY
Registrant City: REDACTED FOR PRIVACY
Registrant State/Province: NV
Registrant Postal Code: REDACTED FOR PRIVACY
Registrant Country: US
Registrant Phone: REDACTED FOR PRIVACY
Registrant Phone Ext: REDACTED FOR PRIVACY
Registrant Fax: REDACTED FOR PRIVACY
Registrant Fax Ext: REDACTED FOR PRIVACY
Registrant Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.

That's the sort of whois entry that you get for an individual using a shared hosting service. In fact, it provides less information than you'll see with my domain, which runs on a shared hosting service, and I pay extra for privacy.

By comparison, the whois entry for Amazon itself looks like this (and note that it's a different registrar, another red flag):

Domain Name: amazon.com
Registry Domain ID: 281209_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2019-08-26T12:19:56-0700
Creation Date: 1994-10-31T21:00:00-0800
Registrar Registration Expiration Date: 2024-10-30T00:00:00-0700
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2083895770
Registrant Name: Hostmaster, Amazon Legal Dept.
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: P.O. Box 8102
Registrant City: Reno
Registrant State/Province: NV
Registrant Postal Code: 89507
Registrant Country: US
Registrant Phone: +1.2062664064
Registrant Phone Ext: 
Registrant Fax: +1.2062667010
Registrant Fax Ext: 
Registrant Email: hostmaster@amazon.com

While I'm a little surprised by the Reno address, rather than Seattle, this at least looks like the sort of registration information used by a business rather than somebody who pays $10/month for hosting.

I ended up getting to the training site via a link on the AWS Console, so was able to achieve my goal.

But I think there's a general lesson: don't forsake your brand without good reason.

And at the very least, ask your network administrators to update your whois data.

Friday, February 26, 2021

Java8 Lambda Startup Times

A few months ago I wrote a post about startup times of AWS Lambdas written in Java. This post has a similar title, but a different topic: it looks at the first-run time for lambdas (lowercase) in a Java program, and has nothing to do with AWS. Although I did discover this issue while writing code for AWS. Confused yet?

Lambda expressions were added to the Java language with the release of Java 8 in 2014. By now I'm assuming every Java programmer has used them, if only as arguments to higher-order functions in the java.util.stream package:

List<String> uppercasedNames = names.stream()
                               .map(s -> s.toUpperCase())
                               .collect(Collectors.toList());

You can implement your own higher-order functions, with parameter types from the java.util.function package (or, if you have more complex needs, defining your own functional interfaces). So, for example, you might have a function with this signature:

public static String retryLambda(long interval, long timeout, Supplier<String> lambda) throws Exception

This can be called with any lambda expression that doesn't take arguments and returns a string. For example:

retryLambda(50, 100, () -> Instant.now().toString());

As you might have guessed from the signature, this function retries some operation. But before I dig into the implementation, here's some background about why you'd implement such a function. Most of my recent posts have referenced my AWS logging library, and this one's no different. When working with AWS, you need to be prepared to retry operations: either because AWS throttled the request (returning an error that indicates you should retry after a short delay), or because operations are eventually-consistent (there's a delay between creating something and being able to use it). As a result, AWS code can include a lot of retry loops:*

long timeoutAt = System.currentTimeMillis() + timeout;
while (System.currentTimeMillis() < timeoutAt)
{
    String value = doSomething();
    if (value != null)
        return value;
    Thread.sleep(interval);
}
throw new RuntimeException("timeout expired");

That's seven lines of boilerplate wrapping one line that actually does something. Functional programming is all about getting rid of boilerplate, so I implemented a function that would accept a lambda:**

public static String retryLambda(long interval, long timeout, Supplier<String> lambda) throws Exception
{   
    long timeoutAt = System.currentTimeMillis() + timeout;
    while (System.currentTimeMillis() < timeoutAt)
    {
        String value = lambda.get();
        if (value != null)
            return value;
        Thread.sleep(interval);
    }
    
    throw new RuntimeException("timeout expired");
}

The hardcoded loops can now be replaced with a call to this function:

retryLambda(50, 250, () -> doSomething());

All well and good, and it reduced the size of the code, but then my tests started failing.

When you're actually talking to AWS, you might need a timeout of 30 seconds or more. But you definitely don't want such a long timeout in a unit test. To solve that problem, I replaced the interval and timeout arguments with much shorter values: 50 and 200 milliseconds. And then my tests would assert the number of times the function was called: based on those values, the operation should be attempted four times before timing out. However, I was seeing that they were only executed two or three times.

When I dug into the problem, what I discovered is that the first execution of a lambda takes 40 to 50 milliseconds on my Core i7-3770K running Oracle Java 1.8.0_271. I knew there was a lot happening behind the scenes to make lambdas work, but wow, that's nearly infinity!

I also ran on an EC2 m5a.xlarge instance running AWS Linux 2, and saw that it took over 70 milliseconds with OpenJDK 1.8.0_272, but only 18 milliseconds running Corretto 11.0.10.9.1. I have to assume that the performance improvement is similar across Java11 implementations, but haven't tested. If you'd like to try it out yourself, I've created a GitHub Gist with the test program.

One thing that I do not want you to take from this post is the idea that Java lambdas are bad, or are poorly implemented. I didn't delve too deeply into what happens during that first invocation, but suspect that the JVM is loading something from disk (much like the initial JVM startup time). And in my experiments, invoking additional, different lambdas did not add to the execution time. So, like anything Java, lambdas are best used in a long-running program.

However, if you are in a similar situation, testing timing-dependent code that utilizes lambdas, you need to be prepared. When I ran into the problem, I simply wanted to move on with my life and relaxed the assertions (the primary assertion was elapsed time, which didn't change; it was simply the number of invocations). Now, after thinking about the problem and writing the example program for this post, I think I'd use a @BeforeClass function to “warm up” the lambda mechanism.


* Not all AWS code needs to have retry loops. But, for example, if you create a Kinesis stream you will need to wait until it becomes active before writing to it. I've seen some mock implementations of AWS services that don't accurately reflect these delays, leading to code that fails in the real world.

* Actually, I implemented a class, which was far easier to replace during testing. For an example if its use, look here.

Tuesday, February 16, 2021

EFS Build Performance Revisited

A few years ago I wrote a post calling out the poor performance of Amazon's Elastic File System (EFS) when used as the working directory for a software build. Since then, EFS has seen many performance improvements. Is it now viable for purposes such as developer home directories or build machines?

TL;DR: no.

As before, I'm using an m5d.xlarge EC2 instance running AWS Linux 2 as my testbed (for the record, ami-03c5cc3d1425c6d34 — you'll see later why I want to remember this). It provides four virtual CPUs and 16GB of RAM, so hardware should not be an issue. My test builds are the AWS SDK and my logging appenders project (releasing the latter is why I spun up the instance in the first place). The appenders project is larger than it was last time, but is still a reasonable “small” project. For consistency, I'm using the same tag (1.11.394) for the AWS SDK; it's grown dramatically in the interim.

I've configured the build machine with three users, each of which has their home directory in one of the tested storage types (instance store, EBS, EFS). The EBS test uses a 100 GB gp2 volume that is dedicated to the build user. For the EFS test I created two volumes, to compare the different EFS performance modes.

For each build I took the following steps:

  1. Copy the project from a "reference" user. This user has project directories without the .git directory, along with a fully-populated Maven local repository.
  2. Perform a test build. This is intended to ensure that all dependencies have been downloaded, and that there is nothing that would cause the build to fail.
  3. Run mvn clean in the test directory.
  4. Flush the disk cache (sync).
  5. For instance store, run TRIM (fstrim -v /) to avoid the penalty of SSD write amplification.
  6. Clear the in-memory buffer cache (echo 3 > /proc/sys/vm/drop_caches)
  7. Run the timed build (time mvn compile).

And here's the results. As before, I show the output from the time command: the first number is the "real" time (the wall-clock time it took to build). The second is "user" (CPU) time, while the third is "system" (kernel operation) time. All times are minutes:seconds, and are rounded to the nearest second.

  Appenders AWS SDK
  Real User System Real User System
Instance Store 00:05 00:13 00:00 01:14 02:16 00:06
EBS 00:06 00:15 00:00 01:26 02:13 00:06
EFS General Purpose 00:23 00:20 00:01 15:29 02:22 00:15
EFS Max IO 00:55 00:18 00:01 36:24 02:28 00:15

Comparing these timings to my previous run, the first thing that jumped out at me was how wildly different reported “user” time is. In fact, they are so different that my first step was to fire up an EC2 instance using the same AMI as the previous test (thankfully, AWS doesn't delete anything), and confirm the numbers (and yes, they were consistent). Intuitively, it should take the same amount of CPU time to compile a project, regardless of the performance of the disk storage, so I'm not sure why I didn't do more digging when I saw the original numbers. Regardless, “real” time tells the story.

And that story is that EFS still takes significantly longer than other options.

There have been definite performance improvements: the “general purpose” EFS volume takes 15 minutes, versus the 30+ required by the earlier test (the close correspondence of the earlier test and the “MAX IO” volume type make me think that it might be the same implementation).

But if you're speccing a build machine — or anything else that needs to work with large numbers of relatively small files — EFS remains a poor choice.

Monday, February 1, 2021

Why I Won't Use The AWS V2 Java SDK

Amazon's "version 2" SDK for Java was first released three and a half years ago. As of this writing, the current version is 2.15.73. So it's time to upgrade your Java AWS projects, right?

To which my answer is a resounding “NO!” In fact, if you are starting a new java AWS project, I recommend that you stick with the version 1 SDK unless you have a compelling reason to change. Especially if you already have code that uses version 1.

I base these statements on my experience updating my logging library. While this library doesn't use a lot of AWS services, it goes deep on the ones that it does use, especially client configuration. For additional context, I've been using the version 1 Java SDK since it came out, I currently use the Python SDK very heavily, and I've also spent some time with the JavaScript and Ruby SDKs. Plus, I implemented my own S3 SDK before the Java SDK was available. In other words, I'm not just someone who spent an hour trying to work with the library, got fed up, and is now venting.

And before I do start venting, I do want to call out the things that I do like about the new library:

Consistent Naming
This is big. It's not just a matter of getting rid of the superfluous “Amazon” or ”AWS” prefix, but of consistent categorization. For example, CloudWatch Logs: that's its name, you find its documentation next to other things named CloudWatch, and you look at your logs by going to the CloudWatch page in the console. Yet in the v1 SDK it's called AWSLogs, and the JAR is aws-java-sdk-logs. A small thing, but small things are what enhance or detract from developer productivity.
Paginators
Paginated requests are a Good Thing: not only do they reduce the load on Amazon's servers, they avoid blowing up client programs (eg, downloading a gigabyte of data due to a poorly configured FilterLogEvents call). But dealing with pagination is a developer tax: the same dozen or so lines of code every time you make a paginated request (in fact, that was one of the primary drivers of my been-on-hold-because-I-mostly-work-with-Python-these-days AWS utility library). It's a lot easier to request a paginator and then loop through results or process them with a Java8 stream. I especially like that Java paginators don't force you to deal with pages at all, unlike the Python paginators.

And … that's about it. Now onto the things that I don't like:

Wholesale name changes of getters and setters
I understand why they did it: everybody hates Java's “bean” naming conventions. The get, set, and with prefixes are just visual clutter. All of the other SDKs use “property” naming, so why not make the Java SDK consistent? But the result, if you're planning to upgrade an existing project, means hours — perhaps days — of tedious manual effort to get rid of those prefixes. And at the same time, you'll be converting all of your code to use builders rather than constructors.
Just when you get used to the new conventions, you find a client that doesn't follow them
In my case, it was discovering that KinesisClient didn't support paginators, so I found myself re-implementing that dozen lines of code to iterate the list of streams. I also discovered that IamClient doesn't provide create(), unlike the other clients, but that brings up a bigger issue.
The documentation is incomplete and at times misleading
IamClient doesn't implement the create() method. That seems like an oversight, until you use IamClient.builder().build() and get an UnknownHostException with a long and confusing chain of causes. To get a usable client, you must add .region(Region.AWS_GLOBAL) to the builder invocation.

There is no hint of this in the IamClient documentation. The only AWS documentation that has anything to say on the topic is the Migration Guide, in which you'll see “Some AWS services don't have Region specific endpoints.” But no list of the classes affected, or examples of client configuration code. Oh, and sorry to amybody in China who might want to use my library: you've got a different region name.

That covers “incomplete.” For “misleading,” take a look at StsAssumeRoleCredentialsProvider. The version 1 variant of this class requires an STS client, which it uses to periodically refresh the credentials. For version 2, there's no indication that this method exists — moreover, the JavaDoc claims that the builder class extends Object, which should mean that the documented methods are all that exist. You have to look at the source code to see that it actually extends a BaseBuilder class (and has since the developer preview was released).

I think that the core problem in both cases is that perhaps 99.9% of everything in the SDK is generated from one central API specification. This is really the only way that you can manage multiple SDKs for over 100 services, each of which exposes dozens of operations. However, that 0.1% (or less) is what gives each SDK its personality, and it seems that the documentation generators for the v2 Java SDK aren't keeping up with the custom pieces. Fortunately, there's Google and Stack Overflow.

There are still missing capabilities
According to the migration guide, “higher level” libraries like the S3 Transfer Manager aren't available; if you need them you'll need to use both the V1 and V2 SDKs. I haven't verified this myself, but in the past I've used the Transfer Manager very heavily and have recommended it to others.

So what would be a compelling reason to switch? The only one I can think of is the ability to use HTTP/2 and true asynchronous clients. Both can improve performance when you're making large numbers of concurrent API calls. Of course, there are often better alternatives to making large numbers of API calls, especially calls that run the risk of being throttled (and a bonus complaint about AWS APIs in general: they have many different ways to report throttling).

One thing that is not compelling is the fear that one day AWS will decide to discontinue the version 1 SDK. First, because AWS never seems to deprecate/remove anything (does anyone still use SimpleDB? how about the Simple Workflow Service?). But more important is the existing code that uses version 1. I don't know how many millions of lines there are, but I suspect that it's large. And I also suspect that most of it belongs to companies that pay Amazon lots of money and would seriously reconsider cloud providers if required to update their codebases.