blog.kdgregory.com

Saturday, February 11, 2017

MySQL: when NOT IN is not equal to NOT EXISTS

When you want to perform a difference operation between two tables, you have a choice: NOT EXISTS with a correlated subquery, or NOT IN. The latter is arguably simpler to write and makes the intent of the query more obvious. And modern database systems will optimize the two queries into similar execution plans, handling the correlation between outer and inner queries (I say “modern” because when I was working with Oracle 7.3 in the mid-90s I learned the hard way that it did not).

There is one key difference between the two constructs: if the subquery returns a NULL in its results then the NOT IN condition will fail, because null is neither equal-to nor not-equal-to any other value. But if you guard against that, they should be equivalent — indeed, some sources will tell you that NOT IN is faster and therefore preferred.

This post is about one case where it's dramatically slower, and nulls are to blame.

Consider the following two tables, which might be used to track clickstream data. Since we track both anonymous and registered users, EVENTS.USER_ID is nullable. However, when the user is not null, the secondary index has high cardinality.

create table USERS
(
  ID    integer auto_increment primary key,
  ...
)

create table EVENTS
(
  ID      integer auto_increment primary key,
  TYPE    smallint not null,
  USER_ID integer
  ...
)

create index EVENTS_USER_IDX on EVENTS(USER_ID);

OK, now let's use these tables: from a small set of users, we want to find the ones that haven't had a particular event. Using a NOT IN, and ensuring that nulls don't appear in the inner results, the query looks like this:

select  ID
from    USERS
where   ID in (1, 7, 2431, 87142, 32768)
and     ID not in
        (
        select  USER_ID
        from    EVENTS
        where   TYPE = 7
        and     USER_ID is not null
        );

For my test dataset, the USERS table has 100,000 rows and the EVENTS table has 10,000,000, of which approximately 75% have a null USER_ID. I'm running on my laptop, which has a Core i7 processor, 12 GB of RAM, and an SSD.

And I consistently get runtimes of around 2 minutes, which is … wow.

Let's replace that NOT IN with a NOT EXISTS and correlated subuery:

select  ID
from    USERS
where   ID in (1, 7, 2431, 87142, 32768)
and     not exists
        (
        select  1
        from    EVENTS
        where   USER_ID = USERS.ID
        and     TYPE = 7
        );

This version runs in 0.01 seconds, which is more what I expected.

Time to compare execution plans. The first plan is from the NOT IN query, the second is from the NOT EXISTS.

+----+--------------------+--------+------------+----------------+-----------------+-----------------+---------+------+------+----------+--------------------------+
| id | select_type        | table  | partitions | type           | possible_keys   | key             | key_len | ref  | rows | filtered | Extra                    |
+----+--------------------+--------+------------+----------------+-----------------+-----------------+---------+------+------+----------+--------------------------+
|  1 | PRIMARY            | USERS  | NULL       | range          | PRIMARY         | PRIMARY         | 4       | NULL |    5 |   100.00 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | EVENTS | NULL       | index_subquery | EVENTS_USER_IDX | EVENTS_USER_IDX | 5       | func |  195 |    10.00 | Using where              |
+----+--------------------+--------+------------+----------------+-----------------+-----------------+---------+------+------+----------+--------------------------+
+----+--------------------+--------+------------+-------+-----------------+-----------------+---------+------------------+------+----------+--------------------------+
| id | select_type        | table  | partitions | type  | possible_keys   | key             | key_len | ref              | rows | filtered | Extra                    |
+----+--------------------+--------+------------+-------+-----------------+-----------------+---------+------------------+------+----------+--------------------------+
|  1 | PRIMARY            | USERS  | NULL       | range | PRIMARY         | PRIMARY         | 4       | NULL             |    5 |   100.00 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | EVENTS | NULL       | ref   | EVENTS_USER_IDX | EVENTS_USER_IDX | 5       | example.USERS.ID |   97 |    10.00 | Using where              |
+----+--------------------+--------+------------+-------+-----------------+-----------------+---------+------------------+------+----------+--------------------------+

Almost identical: both select rows from the USERS table, and then use a nested-loops join (“dependent subquery&rquo;) to retrieve rows from the EVENTS table. Both claim to use EVENTS_USER_IDX to select rows in the subquery. And they estimate similar numbers of rows at each step.

But look more closely at the join types. The NOT IN version uses index_subquery, while the NOT EXISTS version uses ref. Also look at the ref column: the NOT EXISTS version uses an explicit reference to the outer column, while the NOT IN uses a function. What's going on here?

The index_subquery join type indicates that MySQL will scan the index to find relevant rows for the subquery. Could that be the problem? I don't think so, because the EVENTS_USER_IDX is “narrow”: it only has one column, so the engine should not have to read a lot of blocks to find rows corresponding to the IDs from the outer query (indeed, I've tried a variety of queries to exercise this index, and all run in a few hundredths of a second).

For more information, I turned to the “extended” execution plan. To see this plan, prefix the query with explain extended, and follow it with show warnings. Here's what you get from the NOT IN query (reformatted for clarity):

/* select#1 */  select `example`.`USERS`.`ID` AS `ID` 
                from    `example`.`USERS` 
                where   ((`example`.`USERS`.`ID` in (1,7,2431,87142,32768)) 
                        and (not((`example`.`USERS`.`ID`,
                            (((`example`.`USERS`.`ID`) in EVENTS 
                            on      EVENTS_USER_IDX checking NULL
                            where   ((`example`.`EVENTS`.`TYPE` = 7) 
                                    and (`example`.`EVENTS`.`USER_ID` is not null))
                            having (`example`.`EVENTS`.`USER_ID`)))))))

I haven't been able to find an explanation of “on EVENTS_USER_IDX checking NULL” but here's what I think is happening: the optimizer believes that it is executing an IN query that can have NULL in the results; it does not consider the null check in the where clause when making this decision. As a result, it will examine the 7.5 million rows where USER_ID is null, along with the few dozen rows where it matches the values from the outer query. And by “examine,” I mean that it will read the table row and only then apply the is not null condition. Moreover, based on the time taken to run the query, I think it's doing this for every one of the candidate values in the outer query.

So, bottom line: whenever you're thinking of using an IN or NOT IN subquery on a nullable column, rethink and use EXISTS or NOT EXISTS instead.

Friday, January 27, 2017

Trusting the Internet: Picking Third-Party Libraries

Many applications today are like the human body:* a relatively small proportion of “in-house” code, leveraged by dozens if not hundreds of third-party libraries — everything from object-relational mappers to a single function that left-pads a string. And that leads to a conundrum: how do you pick the libraries that you include in your project? Or in other words, is it OK to download something from the Internet and make it a fundamental part of your business?

Sometimes, of course, you don't have a choice. If you use the JUnit testing framework, for example, you are going to get the Hamcrest library along with it (and maybe you'll feel some concern that the hamcrest.org domain is no longer registered). But what criteria did you use to pick JUnit?

I was recently faced with that very question, in looking for a library to parse and validate JSON Web Tokens in Java. There are several libraries to choose from; these are the criteria that I used to pick one, from most important to least.

  1. Following the crowd
    If 100,000 projects use a particular library without issue, chances are good that you can too. But how do you know how many projects use a library? For JavaScript projects, npm gives you numbers of downloads; ditto for Ruby projects and the gems they use. Java projects don't have it so easy: while Maven Central does keep statistics on downloads (they're available to package maintainers), that information isn't available to consumers (other than a listing of the top 10 downloads).

    One interesting technique for following the crowd is looking in your local repository, to see if the package is already there as a dependency of another library. If you can create a dependency tree for your project, look at where the candidate lives in the tree: is it close to the root or deep in the weeds? Is it included by multiple other libraries or just one? These are all signals of how the rest of the world views the library.

  2. Documentation
    I believe that care in documentation is a good proxy for care in implementation. Things I want to see are complete JavaDoc and examples. For projects hosted on GitHub, I should be able to understand the library based solely on the README (and there's no reason that a non-GitHub project should omit a README, although I admit to being guilty in that regard).

  3. Author Credibility
    This can be difficult, especially where the “author” is a corporation (although large corporations tend to do their own vetting before letting projects out under the corporate name). In the case of a sole maintainer, I Google the person's name and see what comes up. I'd like to see web pages that demonstrate deep knowledge of the subject (especially for security-related libraries). Even better are slides from a conference, because that implies tha the author has at least some recognition in the community.

  4. Issue Handling
    Every library has issues. Does the maintainer respond to them in a reasonable timeframe? A large number of outstanding issues should raise a red flag, as should a maintainer that responds in a non-professional manner. You wouldn't accept that from a coworker (I hope), and by using a package you make the maintainer your coworker.

Once I have decided on a candidate library (or small number of candidates) I try it out for my use case. If it looks good, it becomes part of my application. One thing that I do not do is dig into the library source code.

The promise of open-source software is that you can download the sources and inspect them. The reality is that nobody every does that — and nobody could, because it would be more than a full-time job. So we choose as best we can, and hope that there isn't a dependency-of-a-dependency-of-a-dependency that's going to hurt us.


* The reference is to the amount of bacteria and other organisms that don't share your DNA but live on or in your body. You'll often find a 10:1 (bacteria:human) ratio quoted, but see this article for commentary on the history and validity of that ratio (tl;dr, it's more like 60:40).

Wednesday, December 28, 2016

Server-side Authentication with Amazon Cognito IDP

When I first wrote this post, I opened with the caveat that I would not use Cognito in a production application, along with the hope that Amazon would invest in improving it. My comments made their way to the Cognito product manager at Amazon, and we spent an hour on the phone discussing my concerns. I came away with the belief that Amazon is working to improve the product and its documentation, including server-side code. And as I was looking at the documentation in preparation for that call, I saw that it has been improved since I started on this project in October.

He also pointed out some areas where Amazon recommends a different approach than I chose. I have updated this post to either incorporate his suggestions, or note them as sidebars where I prefer my original approach.

So where does that leave me?

Well, as I said in the original post, Cognito offers a compelling feature set. And in general, user management is a distraction from actual application development. So if I can offload that task, I will. But I think that there are still some very rough edges to Cognito; if you choose it for your application, be prepared to jump through some hoops.

Overview: What to expect in the rest of this post

In this post I build a simple authentication framework for a web application. It has three functions: signing up for a new account, signing in to an existing account, and verifying that a user has signed in. The example is built using Java servlets; I intentionally avoided frameworks such as Spring in order to focus on behavior. Similarly, the browser side uses simple HTML pages, with JQuery to send POST requests to the server.

All servlets return 200 for every request unless there's an uncaught exception at the server. The response body is a string that indicates the result of the operation. Depending on the result, the client-side code will either show an alert (for bad inputs) or move to the next step in the flow.

On successful sign-in the servlet stores two cookies, ACCESS_TOKEN and REFRESH_TOKEN, which are used to authorize subsequent requests. These cookies are marked “httpOnly” in order to prevent cross-site scripting attacks. The example code does not make any attempt to prevent cross-site request forgery attacks, as such prevention generally relies data passed as page content.

Also on the topic of security: all communication is sent in clear-text, on the assumption that real-world application will use HTTPS to secure all communications. Cognito provides a client-side library that exchanges secrets in a secure manner, but I'm not using it (because this is intended as a server-side example).

For those that want to follow along at home, the source code is here.

Usernames

I strongly believe in using an email address as the primary account identifier; I get annoyed every time I'm told that “kdgregory” is already in use and that I must guess at a username that's not in use. Email addresses are a unique identifier that won't change (well, usually: my Starwood Preferred Guest account seems to be irrevocably tied to my address at a former employer, even though I attempt to change it every time I check in).

Cognito also has strong opinions about email addresses: they're secondary to the actual username. It does support the ability to validate an email address and use it in place of the username, but the validation process requires the actual username. While we could generate random usernames on the server and store them in a cookie during the signup process, that's more effort than I want to expend.

Fortunately, the rules governing legal usernames allow the use of email addresses. And Cognito allows you to generate an initial password and send it via email, which prevents hijacking of an account by a user that doesn't own that address. Cognito doesn't consider this email to provide validation, which leads to some pain; I'll talk more about that later on.

Creating a User Pool

It's easy to create a user pool, but there are a few gotchas. The following points are ordered by the steps in the current documentation. At the time of writing, you can't use CloudFormation to create pools or clients, so for the example code I provide a shell script that creates a pool and client that match my needs.

  • Require the email attribute but do not mark it as an alias

    Cognito allows users to have alias identifiers that work in place of their username. However, as I mentioned above, these aliases only work if they pass Cognito's validation process. Since we'll be using the email address as the primary identifier, there's no need to mark it as an alias. But we do want to send email to the user so must save it as the email attribute in addition to the username.

  • Do not enable automatic verification of email addresses

    If you enable this feature, Cognito sends your users an email with a random number, and you'll have to provide a page/servlet where they enter this number. Note, however, that by skipping this feature you currently lose the ability to send the user a reset-password email.

  • Do not create a client secret

    When you create a client, you have the option for Cognito to create a secret hash in addition to the client ID. The documentation does not describe how to pass this hash in a request, and Cognito will throw an exception if you don't. Moreover, the JavaScript SDK doesn't support client secrets; they're only used by the Android/iOS SDKs.

Creating a User (sign-up)

Note: Amazon considers adminCreateUser() to be intended for systems where administrators add users, while signUp() is the preferred function for user-initiated signups. The flow for the two functions is quite different: the signUP() flow lets the user pick her own password, and sends a verification email. Personally, I prefer the ability to generate temporary passwords with adminCreateUser().

The sign-up servlet calls the adminCreateUser() function.

try
{
    AdminCreateUserRequest cognitoRequest = new AdminCreateUserRequest()
            .withUserPoolId(cognitoPoolId())
            .withUsername(emailAddress)
            .withUserAttributes(
                    new AttributeType()
                        .withName("email")
                        .withValue(emailAddress),
                    new AttributeType()
                        .withName("email_verified")
                        .withValue("true"))
            .withDesiredDeliveryMediums(DeliveryMediumType.EMAIL)
            .withForceAliasCreation(Boolean.FALSE);

    cognitoClient.adminCreateUser(cognitoRequest);
    reportResult(response, Constants.ResponseMessages.USER_CREATED);
}
catch (UsernameExistsException ex)
{
    logger.debug("user already exists: {}", emailAddress);
    reportResult(response, Constants.ResponseMessages.USER_ALREADY_EXISTS);
}
catch (TooManyRequestsException ex)
{
    logger.warn("caught TooManyRequestsException, delaying then retrying");
    ThreadUtil.sleepQuietly(250);
    doPost(request, response);
}

There are a couple of variables and functions that are used by this snippet but set elsewhere:

  • The cognitoClient variable is defined in an abstract superclass; it holds an instance of AWSCognitoIdentityProviderClient. Like other AWS “client” classes, this class is threadsafe and, I assume, holds a persistent connection to the service. As such, you're encouraged to create a single client and reuse it.
  • The emailAddress variable is populated from a request parameter. I don't want to clutter my examples with boilerplate code to retrieve parameters, so you can assume in future snippets that any variable not explicitly described comes from a parameter.
  • I have two functions in the abstract superclass that retrieve configuration values from the servlet context (which is loaded from the web.xml file). Here I call cognitoPoolId(), which returns the AWS-assigned pool ID. Later you'll see cognitoClientId().

Moving on to the the code itself: most Cognito client functions take a request object and return a response object. The request objects are constructed using the Builder pattern: each “with” function adds something to the request and returns the updated object so that calls can be chained.

Most of the information I provide in the request has to do with the email address. It's the username, as I said above. But I also have to explicitly store it in the email attribute, or Cognito won't be able to send mail to the user. And we want that, so that Cognito will generate and send a temporary password.

I'm also setting the email_verified attribute. This attribute is normally set by Cognito itself when the user performs the verification flow, and it's required to send follow-up emails such as a password reset message. Unfortunately, at this point Cognito doesn't consider the temporary password email to be a suitable verification of the address. So instead I forcibly mark the address as verified when creating the account. If the address doesn't actually exist, the user won't receive her temporary password, and therefore won't be able to sign in.

Exception handling is the other big part of this snippet. As I said before, Cognito uses a mix of exceptions and return codes; the latter signal a “normal” flow, while exceptions are for things that break the flow — even if they're a normal part of user authentication. If you look at the documentation for adminCreateUser() you'll see an almost overwhelming number of possible exceptions. However, many of these are irrelevant to the way that I'm creating users. For example, since I let Cognito generate temporary passwords, there's no need to handle an InvalidPasswordException.

For this example, the only important “creation” exception is thrown when there's already a user with that email address. My example responds to this by showing an alert on the client, but a real application should initiate a “forgotten password” flow.

The second exception that I'm handling, TooManyRequestsException, can be thrown by any operation; you always need to handle it. The documentation isn't clear on the purpose of this exception, but I'm assuming that it's thrown when you exceed the AWS rate limit for requests (versus a security measure specific to repeated signup attempts). My example uses a rather naive solution: retry the operation after sleeping for 250 milliseconds. If you're under heavy load, this could delay the response to the user for several seconds, which could cause them to think your site is down; you might prefer to simply tell them to try again later.

A successful call to adminCreateUser() is only the first part of signing up a new user. The second step is for the user to log in using the temporary password that Cognito sends to their email address. My example page responds to the “user created” response with a client-side redirect to a confirmation page, which has fields for the email address, temporary password, and permanent password. These values are then submitted to the confirmation servlet.

As far as Cognito is concerned, there's no code-level difference between signing in with the temporary password and the final password: you call the adminInitiateAuth() method. The difference is in the response: when you sign in with the temporary password you'll be challenged to provide the final password.

This ends up being a fairly large chunk of code; I've split it into two pieces. The first chunk is straightforward; it handles the initial authentication attempt.

Map initialParams = new HashMap();
initialParams.put("USERNAME", emailAddress);
initialParams.put("PASSWORD", tempPassword);

AdminInitiateAuthRequest initialRequest = new AdminInitiateAuthRequest()
        .withAuthFlow(AuthFlowType.ADMIN_NO_SRP_AUTH)
        .withAuthParameters(initialParams)
        .withClientId(cognitoClientId())
        .withUserPoolId(cognitoPoolId());

AdminInitiateAuthResult initialResponse = cognitoClient.adminInitiateAuth(initialRequest);

The key thing to here is AuthFlowType.ADMIN_NO_SRP_AUTH. Cognito supports several authentication flows; later we'll use the same function to refresh the access token. Client SDKs use the Secure Remote Password (SRP) flow; on the server, where we can secure the credentials, we use the ADMIN_NO_SRP_AUTH flow.

As with the previous operation, we need the pool ID. We also need the client ID; you can create multiple clients per pool, and track which user uses which client (although it's still a single pool of users). You must create at least one client (known in the console as an app). As I noted earlier, both IDs are configured in web.xml and retrieved via functions defined in the abstract superclass.

As I said, the difference between initial signup and normal signup is in the response. In the case of a normal sign-in, which we'll see later, the response contains credentials. In the case of an initial signin, it contains a challenge:

if (! ChallengeNameType.NEW_PASSWORD_REQUIRED.name().equals(initialResponse.getChallengeName()))
{
    throw new RuntimeException("unexpected challenge: " + initialResponse.getChallengeName());
}

Here I expect the “new password required” challenge, and am not prepared for anything else (since the user should only arrive here after a password change). In a real-world application I'd use a nicer error response rather than throwing an exception.

We respond to this challenge with adminRespondToAuthChallenge(), providing the temporary and final passwords. One thing to note is withSession(): Cognito needs to link the challenge response with the challenge request, and this is how it does that.

Map challengeResponses = new HashMap();
challengeResponses.put("USERNAME", emailAddress);
challengeResponses.put("PASSWORD", tempPassword);
challengeResponses.put("NEW_PASSWORD", finalPassword);

AdminRespondToAuthChallengeRequest finalRequest = new AdminRespondToAuthChallengeRequest()
        .withChallengeName(ChallengeNameType.NEW_PASSWORD_REQUIRED)
        .withChallengeResponses(challengeResponses)
        .withClientId(cognitoClientId())
        .withUserPoolId(cognitoPoolId())
        .withSession(initialResponse.getSession());

AdminRespondToAuthChallengeResult challengeResponse = cognitoClient.adminRespondToAuthChallenge(finalRequest);
if (StringUtil.isBlank(challengeResponse.getChallengeName()))
{
    updateCredentialCookies(response, challengeResponse.getAuthenticationResult());
    reportResult(response, Constants.ResponseMessages.LOGGED_IN);
}
else
{
    throw new RuntimeException("unexpected challenge: " + challengeResponse.getChallengeName());
}

Assuming that the provided password was acceptable, and there were no other errors (see below), then we should get a response that (1) has a blank challenge, and (2) has valid credentials. I don't handle the case where we get a new challenge (which in a real-world app might be for multi-factor authentication). I store the returned credentials as cookies (another method in the abstract superclass), and return a message indicating the the user is logged in.

Now for the “other errors.” Unlike the initial signup servlet, there are a bunch of exceptions that might apply to this operation. TooManyRequestsException, of course, is possible for any call, but here are the ones specific to my flow:

  • InvalidPasswordException if you've set rules for passwords and the user's permanent password doesn't satisfy those rules. Cognito lets you require a combination of uppercase letters, lowercase letters, numbers, and special characters, along with a minimum length.
  • UserNotFoundException if the user enters bogus email address. This could be an honest accident, or it could be a fishing expedition. A security-conscious site should attempt to discourage such attacks; one simple approach is to delay the response after every failed request (but note that could lead to a denial-of-service attack against your site!).
  • NotAuthorizedException if the user provides an incorrect temporary password. Again, this could be an honest mistake or an attack; do not give the caller any indication that they have a valid user but invalid password (I return the same “no such user” message for this exception and the previous one).

Before wrapping up this section, I want to point out that there are cases where Cognito throws an exception but the user has been created. There's no good way to recover from this, other than to provide the user with a “lost password” flow.

Authentication (sign-in)

You've already seen the sign-in code, as part of sign-up confirmation. Here I want to focus on the response handling.

Map authParams = new HashMap();
authParams.put("USERNAME", emailAddress);
authParams.put("PASSWORD", password);

AdminInitiateAuthRequest authRequest = new AdminInitiateAuthRequest()
        .withAuthFlow(AuthFlowType.ADMIN_NO_SRP_AUTH)
        .withAuthParameters(authParams)
        .withClientId(cognitoClientId())
        .withUserPoolId(cognitoPoolId());

AdminInitiateAuthResult authResponse = cognitoClient.adminInitiateAuth(authRequest);
if (StringUtil.isBlank(authResponse.getChallengeName()))
{
    updateCredentialCookies(response, authResponse.getAuthenticationResult());
    reportResult(response, Constants.ResponseMessages.LOGGED_IN);
    return;
}
else if (ChallengeNameType.NEW_PASSWORD_REQUIRED.name().equals(authResponse.getChallengeName()))
{
    logger.debug("{} attempted to sign in with temporary password", emailAddress);
    reportResult(response, Constants.ResponseMessages.FORCE_PASSWORD_CHANGE);
}
else
{
    throw new RuntimeException("unexpected challenge on signin: " + authResponse.getChallengeName());
}

With my example pool configuration, once the user has completed signup there shouldn't be any additional challenges. However, we have to handle the “new password required” flow, because the user might not complete signup in one sitting. If she instead attempts to login with her temporary password via the normal sign-in page. So we return a code for that case, and let the sign-in page redirect to the confirmation page.

Exception handling is identical to the signup confirmation code, with the exception of InvalidPasswordException (since we don't change the password here).

Authorization (token validation)

You've seen that updateCredentialCookies() called whenever authentication is successful; it takes the authentication result and stores the relevant tokens as cookies (so that they'll be provided on every request). There are several tokens in the result; I care about two of them:

  • The access token represents a signed-in user, and will expire an hour after sign-in.
  • The refresh token allows the application to generate a new access token without forcing the user to re-authenticate. The lifetime of refresh tokens is measured in days or years (by default, 30 days).
These tokens aren't simply random strings; they're JSON Web Tokens, which include a base64-encoded JSON blob that describes the user:
{
 "sub": "1127b8bd-c828-4a00-92ad-40a786cac946",
 "token_use": "access",
 "scope": "aws.cognito.signin.user.admin",
 "iss": "https:\/\/cognito-idp.us-east-1.amazonaws.com\/us-east-1_rCQ6gAd1Q",
 "exp": 1482239852,
 "iat": 1482236252,
 "jti": "96732ef7-fc62-4265-843e-343a43b6caf7",
 "client_id": "5co5s8e43krcdps2lrp4fo301i",
 "username": "test0716@mailinator.com"
}

You could use the token as the sole indication of whether the user is logged in, by comparing the exp field to the current timestamp (note that exp is seconds since the epoch, while System.currentTimeMillis() is milliseconds, so multiply the former by 1000 before comparing). Each token is signed; you verify this signature using a third-party library and keys downloaded from https://cognito-idp.{region}.amazonaws.com/{userPoolId}/.well-known/jwks.json.

However, there are a couple of significant limitations to using the token as sole source of authorization, both caused by the fixed one hour expiration. The first is that there's no way to force logout before the token expires. Cognito provides a function to invalidate tokens, adminUserGlobalSignOut(), but it's only relevant if you request token validation from AWS. The second is that one hour is excessively short for some purposes, and you'll be forced to refresh the token.

I'm going to show a different approach to authorization: asking AWS to validate the token as part of retrieving information about the user. You'll find this code in the ValidatedAction servlet. In a normal application, it would be common code that's called by any servlet that needs validation.

try
{
    GetUserRequest authRequest = new GetUserRequest().withAccessToken(accessToken);
    GetUserResult authResponse = cognitoClient.getUser(authRequest);

    logger.debug("successful validation for {}", authResponse.getUsername());
    tokenCache.addToken(accessToken);
    reportResult(response, Constants.ResponseMessages.LOGGED_IN);
}
catch (NotAuthorizedException ex)
{
    if (ex.getErrorMessage().equals("Access Token has expired"))
    {
        attemptRefresh(refreshToken, response);
    }
    else
    {
        logger.warn("exception during validation: {}", ex.getMessage());
        reportResult(response, Constants.ResponseMessages.NOT_LOGGED_IN);
    }
}
catch (TooManyRequestsException ex)
{
    logger.warn("caught TooManyRequestsException, delaying then retrying");
    ThreadUtil.sleepQuietly(250);
    doPost(request, response);
}

Before calling this code I retrieve accessToken and refreshToken from their cookies. Ignore tokenCache for now; I'll talk about it below.

The response from getUser() includes all of the user's attributes; you could use them to personalize your web page, or provide profile information. Here, all I care about is whether the request was successful; if it was, I return the “logged in” status.

As before, we have to catch exceptions, and use the “delay and retry” technique if we get TooManyRequestsException. NotAuthorizedException is the one that we need to think about. Unfortunately, it can be thrown for a variety of reasons, ranging from an expired token to one that's completely bogus. More unfortunately, in order to tell the difference we have to look at the actual error message — not something that I like to do, but Amazon didn't provide different exception classes for the different causes.

If the access token has expired, we need to move on to the refresh operation (you'll also need to do this if you're validating tokens based on their contents and the access token has expired).

private void attemptRefresh(String refreshToken, HttpServletResponse response)
throws ServletException, IOException
{
    try
    {
        Map authParams = new HashMap();
        authParams.put("REFRESH_TOKEN", refreshToken);

        AdminInitiateAuthRequest refreshRequest = new AdminInitiateAuthRequest()
                                          .withAuthFlow(AuthFlowType.REFRESH_TOKEN)
                                          .withAuthParameters(authParams)
                                          .withClientId(cognitoClientId())
                                          .withUserPoolId(cognitoPoolId());

        AdminInitiateAuthResult refreshResponse = cognitoClient.adminInitiateAuth(refreshRequest);
        if (StringUtil.isBlank(refreshResponse.getChallengeName()))
        {
            logger.debug("successfully refreshed token");
            updateCredentialCookies(response, refreshResponse.getAuthenticationResult());
            reportResult(response, Constants.ResponseMessages.LOGGED_IN);
        }
        else
        {
            logger.warn("unexpected challenge when refreshing token: {}", refreshResponse.getChallengeName());
            reportResult(response, Constants.ResponseMessages.NOT_LOGGED_IN);
        }
    }
    catch (TooManyRequestsException ex)
    {
        logger.warn("caught TooManyRequestsException, delaying then retrying");
        ThreadUtil.sleepQuietly(250);
        attemptRefresh(refreshToken, response);
    }
    catch (AWSCognitoIdentityProviderException ex)
    {
        logger.debug("exception during token refresh: {}", ex.getMessage());
        reportResult(response, Constants.ResponseMessages.NOT_LOGGED_IN);
    }
}

Note that refreshing a token uses the same function — adminInitiateAuth() — as signin. The difference is that here we use AuthFlowType.REFRESH_TOKEN as the type of authentication, and pass REFRESH_TOKEN as an auth parameter. As before, we have to be prepared for a challenge, although we don't expect any (in a real application, it's possible that the user could request a password change while still logged in, so there may be real challenges).

We do the usual handling of TooManyRequestsException, and consider any other exception to be an error. Assuming that the refresh succeeds, we save the new access token in the response cookies.

All well and good, but let's return to TooManyRequestsException. If we were to authenticate every user action by going to going to AWS then we'd be sure to hit a request limit. Validating credentials based on their content solves this problem, but I've taken a different approach: I maintain a cache of tokens, associated with expiration dates. Rather than the one hour expiration provided by AWS, I use a shorter time interval; this allows me to check for a forced logout.

You'll find the code in CredentialsCache; I'm going to skip over a detailed description here, because in a real-world application, I would probably just accept the one-hour timeout for access tokens and validate based on their contents; the intent of the example code is to show calls to AWS.

Additional Features

If you've been following along at home, you now have a basic authentication system for your website, without having to manage the users yourself. However, there's plenty of room for improvement, and here are a few things to consider.

Password reset

Users forget their passwords, and will expect you to reset it so that they can log in. Cognito provides the adminResetUserPassword() function to force-reset passwords, as well as forgotPassword(). The former sends the user a new temporary password, and subsequent attempts to login will receive a challenge. The latter sends an email with a confirmation code, but (at least at this time) does not prevent the user from logging in with the original credentials.

The bigger concern is not which function you use, but the entire process around password resets. You don't want to simply reset a password just because someone on the Internet clicked a button. Instead, you should send the user an email that allows her to confirm the password reset, typically by using a time-limited link that triggers the reset. Please don't redirect the user to a sign-in page as a result of this link; doing so conditions your users to be vulnerable to phishing attacks. Instead, let the link reset the password (which will generate an email with a new temporary password) and tell your user to log in normally once they receive that link.

Multi-factor authentication (MFA)

Multi-factor authentication requires a user to present credentials based on something that she knows (ie, a password) as well as something that she has. One common approach for the latter requirement is a time-based token, either via a dedicated device or an app like Google Authenticator. These devices hold a secret key that's shared with the server, and generate a unique numeric code based on the current time (typically changing every 30 seconds). As long as the user has physical possession of the device, you know that it's her logging in.

Unfortunately, this isn't how Cognito does MFA (even though it is how the AWS Console works). Instead, Cognito sends a code via SMS to the user's cellphone. This means that you must require the user's phone number as an attribute, and verify that phone number when the user signs up.

Assuming that you do this, the response from adminInitiateAuth() will be a challenge of type SMS_MFA. Cognito will send the user a text message with a secret code, and you need a page to accept the secret code and provide it in the challenge response along with the username. I haven't implemented this, but you can see the general process in the Android SDK function CognitoUser.respondToMfaChallenge().

Federation with other identify providers

Jeff Atwood calls OpenID the driver's license of the Internet. If you haven't used an OpenID-enabled site, the basic premise is this: you already have credentials with a known Internet presense such as Google, so why not delegate authentication to them? In the years since Stack Overflow adopted OpenID as its primary authentication mechanism, many other sites have followed suit; for example, if I want to 3D print something at Shapeways, I log in with my Google account.

However, federated identities are a completely different product than the Cognito Identity Provider (IDP) API that I've been describing. You can't create a user in Cognito IDP and then delegate authentication to another provider.

Instead, Cognito federated identities are a way to let users establish their own identities, which takes the form of a unique identifier that is associated with their third-party login (and in this case, Cognito IDP is considered a third party). You can use this identifier as-is, or you can associate an AWS role with the identity pool. Given that you have no control over who belongs to the pool, you don't want to grant many permissions via this role — access to Cognito Sync is one valid use, as is (perhaps!) read-only access to an S3 bucket.