blog.kdgregory.com: 2021

Saturday, December 18, 2021

My take on the Log4J 2.x vulnerability

A week ago, a lot of Java application developers learned that their applications harbored a severe vulnerability, courtesy of the Log4J 2.x logging framework. The vulnerability, CVE-2021-44228 allowed execution of arbitrary code on any system that logged unsanitized user input using the Log4J 2.x library. This prompted a lot of people to scramble to protect their systems, and a lot of people to take to Twitter with opinions of What It All Means.

As the maintainer of an open-source library that integrates with Log4J, I spent my Saturday morning understanding the bug and updating my examples to use versions of the library that mitigated the problem. Fortunately, my library is not directly affected, as long as its consumers don't use unsanitized data to configure the logging framework.

Having done that, and reading as much as I could on the issue (the LunaSec writeups are excellent), I've decided to voice my own thoughts on the matter. Some of which I haven't seen other people say, so they may be similarly interesting to you.

First, I think that it's important to recognize that this vulnerability — like most things that end up in the news — is not the result of a single issue. Instead, it's a combination of features, some of which make perfect sense:

Log4J 2.x provides string interpolation for configuration (aka “lookups”)
This is a great idea; I implemented it for my own logging library. The idea is that you don't want your configuration files to contain hardcoded values, such as the address of your central syslog daemon (or, in my case, the name of a CloudWatch log group). Log4J 2.x provides a long list of lookups, ranging from environment variables to information from the Docker container running your application.
One of the lookups retrieves data from the Java Naming and Directory Interface (JNDI) API
In large deployments, it's nice to be able to centralize your configuration. There are lots of tools to do this, but Java Enterprise Edition settled on JNDI, also known as the javax.naming package. JNDI is an umbrella API for retrieving data from different sources, such as LDAP.
javax.naming.spi.ObjectFactory supports loading code from a remote server
This is a somewhat dubious idea, but the JNDI SPI (service provider interface) spec justifies it: it lets you defined resources (such as printer drivers) that can be retrieved directly from JNDI. They use the example of a printer driver, and it should be a familiar example to anyone who has installed a printer on their home computer (do you know that the driver you installed is trustworthy?).
Note: this condition is necessary for remote-code execution, but not for data exfiltration.
The Log4J 2.x PatternLayout also supports string interpolation, with the same set of lookups
Here we're getting into questionable features in the library. I had no idea that this behavior existed, even though I dug deeply into the documentation and source code when implementing my appenders. The documentation for this layout class is quite long, and prior to the vulnerability, you had to infer the behavior based on the presence of the nolookups pattern option.
Users log unsanitized data
If users passed all of their logging messages through a sanitizer that looked for and removed the sequence ${, then the vulnerability wouldn't exist. Except nobody does that, because why would you have to? I call it out because I think it's important to consider what you're logging, and whether it might contain information that you don't want to log. As I say in my “effective logging” talk, passwords can hide in the darndest places.

These are the things that need to happen to make this vulnerability affect you. If you can prevent one of them from happening, you're not vulnerable … to this specific issue. And there are ways to do this, although they aren't things that you can easily implement once the vulnerability is discovered. I'm writing a separate post on how Cloud deployments can mitigate similar vulnerabilities.

For now, however, I want to focus on two of the responses that I saw online: pay for open-source maintainers, and inspecting code before using it. In my opinion, neither of these are valid, and they distract from preparing for the next serious vulnerability.

Response #1: inspect code before you use it.

For any non-trivial application, this is simply impossible.

The current “trunk” revision of log4j-core has 89,778 lines of code, excluding test classes (measured using find and wc). That doesn't count any add-on libraries that you might use to write to your preferred destination, such as my log4j-aws-appenders (which has over 10,000 lines of mainline code). And logging is a tiny part of a modern application, which is typically built using a framework such as Spring, and runs on an application server such as Tomcat or Jetty.

And even if your company is willing to pay for the staff to read all of that source code, what is the chance that they will discover a vulnerability? What is the chance that they will even learn all of its behaviors? I've spent quite a bit of time with the Log4J 2.x code and documentation, both to write my library and on a post describing how to write Log4J plugins, and I never realized that it applied string interpolation to the raw log messages. After I knew about the vulnerability, of course, it was easy to find the code responsible.

Response #2: we — or at least large companies — should be paying the maintainers of open-source projects.

I don't know the situation of the core Log4J developers, and whether or not they get paid for their work on the project. But I don't believe that this vulnerability was the result of an overworked, unpaid, solo developer. True, there was no discussion on the commit that introduced JNDI lookups (ignoring the tacky post-hoc commenters). But the code that applies string substitution to logging events has been in the library since the 2.0 release, and has been rewritten multiple times since then.

In fact, I think direct corporate sponsorship would lead to more unexpected behavior, because the sponsoring corporations will all have their own desires, and an expectation that those desires will be met. And if my experience in the corporate world is any guide, a developer that feels their paycheck is in jeopardy is much more likely to do something without giving it full thought.

So where does that leave us?

Unfortunately, I haven't seen very many practical responses (although, as I said, I'm writing a separate post to this effect). And I think that the reason for that is that the real answer puts the onus for vulnerabilities on us, the consumers of open-source software.

Linus's Law, coined by Eric S Raymond, is that “given enough eyeballs, all bugs are shallow.” And this played out in full view with this vulnerability: the Log4J team quickly found and patched the problem, and has been releasing additional patches since.² There have also been multiple third-parties that have written detailed evaluations of the vulnerability.³

But still, we, the consumers of this package, needed to update our builds to use the latest version. And then deploy those updated builds. If you didn't already have a process that allows you to quickly build and deploy a hotfix, then your weekend was shot. Even if you did get a hotfix out, you needed to spend time evaluating your systems to ensure that they weren't compromised (and beware that the most likely compromise was exfiltration of secrets!).

It's natural, in such circumstances, to feel resentment, and to look for external change to make such problems go away. Or maybe even to think about reducing your reliance on open-source projects.

But in my opinion, this is the wrong outcome. Instead, look at this event as a wake-up call to make your systems more robust. Be prepared to do a hotfix at a moment's notice. Utilize tools such as web application firewalls, which can be quickly updated to block malicious traffic. And improve your forensic logging, so that you can identify the effects of vulnerabilities after they appear (just don't log unsanitized input!).

Because this is not the last vulnerability that you will see.

1: You might be interested to learn that the Apache Software Foundation receives approximately $3 million a year (annual reports here). However, to the best of my knowledge, they do not pay stipends to core maintainers. I do not know how the core Log4J 2.x maintainers earn their living.

2: Log4J 2.x change report.

3: I particularly like the CloudFlare posts, especially the one that describes how attackers changed their payloads to avoid simple firewall rules. This post informed my belief that the goal of most attacks was to exfiltrate secrets rather than take over systems.

Wednesday, July 14, 2021

My take on "How to re:Invent"

It's back. AWS is taking over Las Vegas for a week filled with information, sales pitches, and corporate-friendly activities. And while COVID and the possibility of a “fourth wave” hang over the conference, I decided to sign up. Having been to re:Invent once, I now consider myself an expert on how to survive the week. Here are five of my suggestions.

#1: Wear comfortable shoes.: OK, everybody says this, but it bears repeating: you're going to do a lot of walking. It might take ten minutes to navigate from the front door of a hotel to the meeting rooms, following a labyrinthine path through the casino. To give you some numbers: my watch recorded 87,000 steps, or 44.6 miles, over the five days of the conference. That may be higher than average: I often walked between venues rather than find my way to the shuttle buses. But even if you “only” walk 30 miles, you'll still be thankful for doing it in a pair of running shoes.
#2: Stay in a “venue” hotel.: These are the hotels that host sessions and other sponsored content, as opposed to the “sleeping” hotels that just have rooms for attendees. There are several reasons to stay at a venue hotel, but in my opinion the most important is that it cuts down on the amount of walking that you have to do. Of my 87,000 steps, I estimate that 10,000 or more were taken up in walking from my room at the Park MGM to the Aria so that I could pick up a shuttle bus.
#3: Attend workshops, not sessions.: There are some great sessions at re:Invent, conducted by people who are intimately familiar with the service. If you have specific questions it's worth attending one of the “deep dives” and then walking up to the speaker afterward to ask those questions.
But, all of these sessions will be recorded, and you can watch them at your leisure. So if you don't have specific questions there's no reason to attend in-person. What you can't do after December 3rd is learn with an AWS instructor by your side (well, not for free anyway). Unfortunately, space for these workshops is limited, so sign up early for the ones you want (that said, scheduling at re:Invent is extremely fluid; at least half the sessions I attended were marked as full but then had spots open up an hour before they started).
#4: Fly out on Saturday.: If you're not from the United States, you may not realize that re:Invent takes place during the week after the Thanksgiving holiday. Thanksgiving is a time when people return home to visit family and friends, and then all of them get on a plane the following Sunday to return home. It's historically the busiest travel day of the year, and US airports are crowded and frantic with people who only fly on that weekend. Plus, airlines charge the highest rates of the year, because they know people will pay. Even if you have TSA/Pre, it's not fun.
If you're willing to fly out a day early, you avoid the crowds. Plus, you can save significantly on the airfare (right now, it would save me over $300, or nearly 50%). Against that, you'll be paying for an extra night in Vegas. For me, with the conference rate for the hotel, the numbers worked.
#5: Get out of Vegas.: For some people, Vegas is a destination: they love the lights, the noise, and the constant activity. For me, it's overwhelming. Fortunately, you can find thousands of square miles of absolute desolation just outside city limits.
Last time, I rented a motorcycle for a day and explored nearby attractions: Valley of Fire, Hoover Dam, and Red Rock Canyon. This year, I'm planning to take three days and explore southern Utah and northern Arizona. If you're not a motorcyclist, Vegas also has plenty of rental cars, including exotics. And at the far end of the scale, you can spend a day in a high-performance driving class at the Las Vegas Motor Speedway.

Well, that's it. Now it's time to cross my fingers and hope the US COVID situation remains under control.

Tuesday, June 8, 2021

An open letter to the AWS Training organization

You don't have a Feedback link on your site, but it seems that Amazon keeps close tabs on the Blogosphere, so hopefully this reaches you.

I don't know whether you're an actual sub-division of Amazon, but the website URL https://www.aws.training certainly didn't give me a warm fuzzy feeling when it came up in Google. In fact, my first thought was that it was some unaffiliated company that had better SEO.

So, since it was asking me for login credentials, I did what any reasonably cautious technologist would do, and ran whois. And this is what I got back:

Domain Name: aws.training
Registry Domain ID: 8d519b3def254d2f980a08f62416a5b9-DONUTS
Registrar WHOIS Server: whois.comlaude.com
Registrar URL: http://www.comlaude.com
Updated Date: 2019-05-19T19:54:24Z
Creation Date: 2014-03-19T00:32:11Z
Registry Expiry Date: 2024-03-19T00:32:11Z
Registrar: Nom-iq Ltd. dba COM LAUDE
Registrar IANA ID: 470
Registrar Abuse Contact Email: abuse@comlaude.com
Registrar Abuse Contact Phone: +44.2074218250
Registrant Name: REDACTED FOR PRIVACY
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: REDACTED FOR PRIVACY
Registrant City: REDACTED FOR PRIVACY
Registrant State/Province: NV
Registrant Postal Code: REDACTED FOR PRIVACY
Registrant Country: US
Registrant Phone: REDACTED FOR PRIVACY
Registrant Phone Ext: REDACTED FOR PRIVACY
Registrant Fax: REDACTED FOR PRIVACY
Registrant Fax Ext: REDACTED FOR PRIVACY
Registrant Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.

That's the sort of whois entry that you get for an individual using a shared hosting service. In fact, it provides less information than you'll see with my domain, which runs on a shared hosting service, and I pay extra for privacy.

By comparison, the whois entry for Amazon itself looks like this (and note that it's a different registrar, another red flag):

Domain Name: amazon.com
Registry Domain ID: 281209_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2019-08-26T12:19:56-0700
Creation Date: 1994-10-31T21:00:00-0800
Registrar Registration Expiration Date: 2024-10-30T00:00:00-0700
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2083895770
Registrant Name: Hostmaster, Amazon Legal Dept.
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: P.O. Box 8102
Registrant City: Reno
Registrant State/Province: NV
Registrant Postal Code: 89507
Registrant Country: US
Registrant Phone: +1.2062664064
Registrant Phone Ext: 
Registrant Fax: +1.2062667010
Registrant Fax Ext: 
Registrant Email: hostmaster@amazon.com

While I'm a little surprised by the Reno address, rather than Seattle, this at least looks like the sort of registration information used by a business rather than somebody who pays $10/month for hosting.

I ended up getting to the training site via a link on the AWS Console, so was able to achieve my goal.

But I think there's a general lesson: don't forsake your brand without good reason.

And at the very least, ask your network administrators to update your whois data.

Friday, February 26, 2021

Java8 Lambda Startup Times

A few months ago I wrote a post about startup times of AWS Lambdas written in Java. This post has a similar title, but a different topic: it looks at the first-run time for lambdas (lowercase) in a Java program, and has nothing to do with AWS. Although I did discover this issue while writing code for AWS. Confused yet?

Lambda expressions were added to the Java language with the release of Java 8 in 2014. By now I'm assuming every Java programmer has used them, if only as arguments to higher-order functions in the java.util.stream package:

List<String> uppercasedNames = names.stream()
                               .map(s -> s.toUpperCase())
                               .collect(Collectors.toList());

You can implement your own higher-order functions, with parameter types from the java.util.function package (or, if you have more complex needs, defining your own functional interfaces). So, for example, you might have a function with this signature:

public static String retryLambda(long interval, long timeout, Supplier<String> lambda) throws Exception

This can be called with any lambda expression that doesn't take arguments and returns a string. For example:

retryLambda(50, 100, () -> Instant.now().toString());

As you might have guessed from the signature, this function retries some operation. But before I dig into the implementation, here's some background about why you'd implement such a function. Most of my recent posts have referenced my AWS logging library, and this one's no different. When working with AWS, you need to be prepared to retry operations: either because AWS throttled the request (returning an error that indicates you should retry after a short delay), or because operations are eventually-consistent (there's a delay between creating something and being able to use it). As a result, AWS code can include a lot of retry loops:*

long timeoutAt = System.currentTimeMillis() + timeout;
while (System.currentTimeMillis() < timeoutAt)
{
    String value = doSomething();
    if (value != null)
        return value;
    Thread.sleep(interval);
}
throw new RuntimeException("timeout expired");

That's seven lines of boilerplate wrapping one line that actually does something. Functional programming is all about getting rid of boilerplate, so I implemented a function that would accept a lambda:**

public static String retryLambda(long interval, long timeout, Supplier<String> lambda) throws Exception
{   
    long timeoutAt = System.currentTimeMillis() + timeout;
    while (System.currentTimeMillis() < timeoutAt)
    {
        String value = lambda.get();
        if (value != null)
            return value;
        Thread.sleep(interval);
    }
    
    throw new RuntimeException("timeout expired");
}

The hardcoded loops can now be replaced with a call to this function:

retryLambda(50, 250, () -> doSomething());

All well and good, and it reduced the size of the code, but then my tests started failing.

When you're actually talking to AWS, you might need a timeout of 30 seconds or more. But you definitely don't want such a long timeout in a unit test. To solve that problem, I replaced the interval and timeout arguments with much shorter values: 50 and 200 milliseconds. And then my tests would assert the number of times the function was called: based on those values, the operation should be attempted four times before timing out. However, I was seeing that they were only executed two or three times.

When I dug into the problem, what I discovered is that the first execution of a lambda takes 40 to 50 milliseconds on my Core i7-3770K running Oracle Java 1.8.0_271. I knew there was a lot happening behind the scenes to make lambdas work, but wow, that's nearly infinity!

I also ran on an EC2 m5a.xlarge instance running AWS Linux 2, and saw that it took over 70 milliseconds with OpenJDK 1.8.0_272, but only 18 milliseconds running Corretto 11.0.10.9.1. I have to assume that the performance improvement is similar across Java11 implementations, but haven't tested. If you'd like to try it out yourself, I've created a GitHub Gist with the test program.

One thing that I do not want you to take from this post is the idea that Java lambdas are bad, or are poorly implemented. I didn't delve too deeply into what happens during that first invocation, but suspect that the JVM is loading something from disk (much like the initial JVM startup time). And in my experiments, invoking additional, different lambdas did not add to the execution time. So, like anything Java, lambdas are best used in a long-running program.

However, if you are in a similar situation, testing timing-dependent code that utilizes lambdas, you need to be prepared. When I ran into the problem, I simply wanted to move on with my life and relaxed the assertions (the primary assertion was elapsed time, which didn't change; it was simply the number of invocations). Now, after thinking about the problem and writing the example program for this post, I think I'd use a @BeforeClass function to “warm up” the lambda mechanism.

* Not all AWS code needs to have retry loops. But, for example, if you create a Kinesis stream you will need to wait until it becomes active before writing to it. I've seen some mock implementations of AWS services that don't accurately reflect these delays, leading to code that fails in the real world.

* Actually, I implemented a class, which was far easier to replace during testing. For an example if its use, look here.

Tuesday, February 16, 2021

EFS Build Performance Revisited

A few years ago I wrote a post calling out the poor performance of Amazon's Elastic File System (EFS) when used as the working directory for a software build. Since then, EFS has seen many performance improvements. Is it now viable for purposes such as developer home directories or build machines?

TL;DR: no.

As before, I'm using an m5d.xlarge EC2 instance running AWS Linux 2 as my testbed (for the record, ami-03c5cc3d1425c6d34 — you'll see later why I want to remember this). It provides four virtual CPUs and 16GB of RAM, so hardware should not be an issue. My test builds are the AWS SDK and my logging appenders project (releasing the latter is why I spun up the instance in the first place). The appenders project is larger than it was last time, but is still a reasonable “small” project. For consistency, I'm using the same tag (1.11.394) for the AWS SDK; it's grown dramatically in the interim.

I've configured the build machine with three users, each of which has their home directory in one of the tested storage types (instance store, EBS, EFS). The EBS test uses a 100 GB gp2 volume that is dedicated to the build user. For the EFS test I created two volumes, to compare the different EFS performance modes.

For each build I took the following steps:

Copy the project from a "reference" user. This user has project directories without the .git directory, along with a fully-populated Maven local repository.
Perform a test build. This is intended to ensure that all dependencies have been downloaded, and that there is nothing that would cause the build to fail.
Run mvn clean in the test directory.
Flush the disk cache (sync).
For instance store, run TRIM (fstrim -v /) to avoid the penalty of SSD write amplification.
Clear the in-memory buffer cache (echo 3 > /proc/sys/vm/drop_caches)
Run the timed build (time mvn compile).

And here's the results. As before, I show the output from the time command: the first number is the "real" time (the wall-clock time it took to build). The second is "user" (CPU) time, while the third is "system" (kernel operation) time. All times are minutes:seconds, and are rounded to the nearest second.

	Appenders			AWS SDK
	Real	User	System	Real	User	System
Instance Store	00:05	00:13	00:00	01:14	02:16	00:06
EBS	00:06	00:15	00:00	01:26	02:13	00:06
EFS General Purpose	00:23	00:20	00:01	15:29	02:22	00:15
EFS Max IO	00:55	00:18	00:01	36:24	02:28	00:15

Comparing these timings to my previous run, the first thing that jumped out at me was how wildly different reported “user” time is. In fact, they are so different that my first step was to fire up an EC2 instance using the same AMI as the previous test (thankfully, AWS doesn't delete anything), and confirm the numbers (and yes, they were consistent). Intuitively, it should take the same amount of CPU time to compile a project, regardless of the performance of the disk storage, so I'm not sure why I didn't do more digging when I saw the original numbers. Regardless, “real” time tells the story.

And that story is that EFS still takes significantly longer than other options.

There have been definite performance improvements: the “general purpose” EFS volume takes 15 minutes, versus the 30+ required by the earlier test (the close correspondence of the earlier test and the “MAX IO” volume type make me think that it might be the same implementation).

But if you're speccing a build machine — or anything else that needs to work with large numbers of relatively small files — EFS remains a poor choice.

Monday, February 1, 2021

Why I Won't Use The AWS V2 Java SDK

Update, 2024-01-24

Well, I was wrong: AWS is end-of-lifing the v1 SDK. They will stop adding new features in July of this year, and stop providing updates in December 2025. They just announced this plan, so I'm still waiting to see if they'll back-pedal under pressure from large customers. But until that happens, ignore my advice below and start converting your code.

That said, feel free to read my venting from three years ago ...

Amazon's "version 2" SDK for Java was first released three and a half years ago. As of this writing, the current version is 2.15.73. So it's time to upgrade your Java AWS projects, right?

To which my answer is a resounding “NO!” In fact, if you are starting a new java AWS project, I recommend that you stick with the version 1 SDK unless you have a compelling reason to change. Especially if you already have code that uses version 1.

I base these statements on my experience updating my logging library. While this library doesn't use a lot of AWS services, it goes deep on the ones that it does use, especially client configuration. For additional context, I've been using the version 1 Java SDK since it came out, I currently use the Python SDK very heavily, and I've also spent some time with the JavaScript and Ruby SDKs. Plus, I implemented my own S3 SDK before the Java SDK was available. In other words, I'm not just someone who spent an hour trying to work with the library, got fed up, and is now venting.

And before I do start venting, I want to call out the things that I do like about the new library:

Consistent Naming: This is big. It's not just a matter of getting rid of the superfluous “Amazon” or ”AWS” prefix, but of consistent categorization. For example, CloudWatch Logs: that's its name, you find its documentation next to other things named CloudWatch, and you look at your logs by going to the CloudWatch page in the console. Yet in the v1 SDK it's called AWSLogs, and the JAR is aws-java-sdk-logs. A small thing, but small things are what enhance or detract from developer productivity.
Paginators: Paginated requests are a Good Thing: not only do they reduce the load on Amazon's servers, they avoid blowing up client programs (eg, downloading a gigabyte of data due to a poorly configured FilterLogEvents call). But dealing with pagination is a developer tax: the same dozen or so lines of code every time you make a paginated request (in fact, that was one of the primary drivers of my been-on-hold-because-I-mostly-work-with-Python-these-days AWS utility library). It's a lot easier to request a paginator and then loop through results or process them with a Java8 stream. I especially like that Java paginators don't force you to deal with pages at all, unlike the Python paginators.

And … that's about it. Now onto the things that I don't like:

Wholesale name changes of getters and setters

I understand why they did it: everybody hates Java's “bean” naming conventions. The get, set, and with prefixes are just visual clutter. All of the other SDKs use “property” naming, so why not make the Java SDK consistent? But the result, if you're planning to upgrade an existing project, means hours — perhaps days — of tedious manual effort to get rid of those prefixes. And at the same time, you'll be converting all of your code to use builders rather than constructors.

Just when you get used to the new conventions, you find a client that doesn't follow them

In my case, it was discovering that KinesisClient didn't support paginators, so I found myself re-implementing that dozen lines of code to iterate the list of streams. I also discovered that IamClient doesn't provide create(), unlike the other clients, but that brings up a bigger issue.

The documentation is incomplete and at times misleading

IamClient doesn't implement the create() method. That seems like an oversight, until you use IamClient.builder().build() and get an UnknownHostException with a long and confusing chain of causes. To get a usable client, you must add .region(Region.AWS_GLOBAL) to the builder invocation.

There is no hint of this in the IamClient documentation. The only AWS documentation that has anything to say on the topic is the Migration Guide, in which you'll see “Some AWS services don't have Region specific endpoints.” But no list of the classes affected, or examples of client configuration code. Oh, and sorry to amybody in China who might want to use my library: you've got a different region name.

That covers “incomplete.” For “misleading,” take a look at StsAssumeRoleCredentialsProvider. The version 1 variant of this class requires an STS client, which it uses to periodically refresh the credentials. For version 2, there's no indication that this method exists — moreover, the JavaDoc claims that the builder class extends Object, which should mean that the documented methods are all that exist. You have to look at the source code to see that it actually extends a BaseBuilder class (and has since the developer preview was released).

I think that the core problem in both cases is that perhaps 99.9% of everything in the SDK is generated from one central API specification. This is really the only way that you can manage multiple SDKs for over 100 services, each of which exposes dozens of operations. However, that 0.1% (or less) is what gives each SDK its personality, and it seems that the documentation generators for the v2 Java SDK aren't keeping up with the custom pieces. Fortunately, there's Google and Stack Overflow.

There are still missing capabilities

According to the migration guide, “higher level” libraries like the S3 Transfer Manager aren't available; if you need them you'll need to use both the V1 and V2 SDKs. I haven't verified this myself, but in the past I've used the Transfer Manager very heavily and have recommended it to others.

So what would be a compelling reason to switch? The only one I can think of is the ability to use HTTP/2 and true asynchronous clients. Both can improve performance when you're making large numbers of concurrent API calls. Of course, there are often better alternatives to making large numbers of API calls, especially calls that run the risk of being throttled (and a bonus complaint about AWS APIs in general: they have many different ways to report throttling).

One thing that is not compelling is the fear that one day AWS will decide to discontinue the version 1 SDK. First, because AWS never seems to deprecate/remove anything (does anyone still use SimpleDB? how about the Simple Workflow Service?). But more important is the existing code that uses version 1. I don't know how many millions of lines there are, but I suspect that it's large. And I also suspect that most of it belongs to companies that pay Amazon lots of money and would seriously reconsider cloud providers if required to update their codebases.

blog.kdgregory.com