Monday, February 1, 2021

Why I Won't Use The AWS V2 Java SDK

Update, 2024-01-24

Well, I was wrong: AWS is end-of-lifing the v1 SDK. They will stop adding new features in July of this year, and stop providing updates in December 2025. They just announced this plan, so I'm still waiting to see if they'll back-pedal under pressure from large customers. But until that happens, ignore my advice below and start converting your code.

That said, feel free to read my venting from three years ago ...

Amazon's "version 2" SDK for Java was first released three and a half years ago. As of this writing, the current version is 2.15.73. So it's time to upgrade your Java AWS projects, right?

To which my answer is a resounding “NO!” In fact, if you are starting a new java AWS project, I recommend that you stick with the version 1 SDK unless you have a compelling reason to change. Especially if you already have code that uses version 1.

I base these statements on my experience updating my logging library. While this library doesn't use a lot of AWS services, it goes deep on the ones that it does use, especially client configuration. For additional context, I've been using the version 1 Java SDK since it came out, I currently use the Python SDK very heavily, and I've also spent some time with the JavaScript and Ruby SDKs. Plus, I implemented my own S3 SDK before the Java SDK was available. In other words, I'm not just someone who spent an hour trying to work with the library, got fed up, and is now venting.

And before I do start venting, I want to call out the things that I do like about the new library:

Consistent Naming
This is big. It's not just a matter of getting rid of the superfluous “Amazon” or ”AWS” prefix, but of consistent categorization. For example, CloudWatch Logs: that's its name, you find its documentation next to other things named CloudWatch, and you look at your logs by going to the CloudWatch page in the console. Yet in the v1 SDK it's called AWSLogs, and the JAR is aws-java-sdk-logs. A small thing, but small things are what enhance or detract from developer productivity.
Paginators
Paginated requests are a Good Thing: not only do they reduce the load on Amazon's servers, they avoid blowing up client programs (eg, downloading a gigabyte of data due to a poorly configured FilterLogEvents call). But dealing with pagination is a developer tax: the same dozen or so lines of code every time you make a paginated request (in fact, that was one of the primary drivers of my been-on-hold-because-I-mostly-work-with-Python-these-days AWS utility library). It's a lot easier to request a paginator and then loop through results or process them with a Java8 stream. I especially like that Java paginators don't force you to deal with pages at all, unlike the Python paginators.

And … that's about it. Now onto the things that I don't like:

Wholesale name changes of getters and setters
I understand why they did it: everybody hates Java's “bean” naming conventions. The get, set, and with prefixes are just visual clutter. All of the other SDKs use “property” naming, so why not make the Java SDK consistent? But the result, if you're planning to upgrade an existing project, means hours — perhaps days — of tedious manual effort to get rid of those prefixes. And at the same time, you'll be converting all of your code to use builders rather than constructors.
Just when you get used to the new conventions, you find a client that doesn't follow them
In my case, it was discovering that KinesisClient didn't support paginators, so I found myself re-implementing that dozen lines of code to iterate the list of streams. I also discovered that IamClient doesn't provide create(), unlike the other clients, but that brings up a bigger issue.
The documentation is incomplete and at times misleading
IamClient doesn't implement the create() method. That seems like an oversight, until you use IamClient.builder().build() and get an UnknownHostException with a long and confusing chain of causes. To get a usable client, you must add .region(Region.AWS_GLOBAL) to the builder invocation.

There is no hint of this in the IamClient documentation. The only AWS documentation that has anything to say on the topic is the Migration Guide, in which you'll see “Some AWS services don't have Region specific endpoints.” But no list of the classes affected, or examples of client configuration code. Oh, and sorry to amybody in China who might want to use my library: you've got a different region name.

That covers “incomplete.” For “misleading,” take a look at StsAssumeRoleCredentialsProvider. The version 1 variant of this class requires an STS client, which it uses to periodically refresh the credentials. For version 2, there's no indication that this method exists — moreover, the JavaDoc claims that the builder class extends Object, which should mean that the documented methods are all that exist. You have to look at the source code to see that it actually extends a BaseBuilder class (and has since the developer preview was released).

I think that the core problem in both cases is that perhaps 99.9% of everything in the SDK is generated from one central API specification. This is really the only way that you can manage multiple SDKs for over 100 services, each of which exposes dozens of operations. However, that 0.1% (or less) is what gives each SDK its personality, and it seems that the documentation generators for the v2 Java SDK aren't keeping up with the custom pieces. Fortunately, there's Google and Stack Overflow.

There are still missing capabilities
According to the migration guide, “higher level” libraries like the S3 Transfer Manager aren't available; if you need them you'll need to use both the V1 and V2 SDKs. I haven't verified this myself, but in the past I've used the Transfer Manager very heavily and have recommended it to others.

So what would be a compelling reason to switch? The only one I can think of is the ability to use HTTP/2 and true asynchronous clients. Both can improve performance when you're making large numbers of concurrent API calls. Of course, there are often better alternatives to making large numbers of API calls, especially calls that run the risk of being throttled (and a bonus complaint about AWS APIs in general: they have many different ways to report throttling).

One thing that is not compelling is the fear that one day AWS will decide to discontinue the version 1 SDK. First, because AWS never seems to deprecate/remove anything (does anyone still use SimpleDB? how about the Simple Workflow Service?). But more important is the existing code that uses version 1. I don't know how many millions of lines there are, but I suspect that it's large. And I also suspect that most of it belongs to companies that pay Amazon lots of money and would seriously reconsider cloud providers if required to update their codebases.

No comments: