Saturday, August 12, 2017

Announcing log4j-aws-appenders

A few months ago I started a "weekend project" to enable logging from my application to CloudWatch. I had used the AWS-provided log appender when working with AWS Lambda, and liked its convenience. For applications running on EC2 instances, however, the CloudWatch Logs Agent was the recommended way to go. I looked around, but all I found was an appender for Log4J 2.0 (I assumed that the Lambda appender uses some Lambda-specific features).

So, as I said, weekend project. Except that I started adding features and refining how the appender worked, based on my use with a semi-production project (runs 24/7, but not business-critical at the moment). At this point it's been running apparently bug-free for weeks, and I can't think of any features that I want to add, so it's time to release.

The JAR is available on Maven Central, so you can simply add it to your project POM:

<dependency>
    <groupId>com.kdgregory.log4j</groupId>
    <artifactId>aws-appenders</artifactId>
    <version>1.0.0</version>
</dependency>

Then you need to add the appender to your Log4J config:

log4j.rootLogger=WARN, console

log4j.logger.com.example.log4j=DEBUG, cloudwatch
log4j.additivity.com.example.log4j=true

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d [%t] %-5p %c %x - %m%n

log4j.appender.cloudwatch=com.kdgregory.log4j.aws.CloudWatchAppender
log4j.appender.cloudwatch.layout=org.apache.log4j.PatternLayout
log4j.appender.cloudwatch.layout.ConversionPattern=%d [%t] %-5p %c %x - %m%n

log4j.appender.cloudwatch.logGroup=ExampleCloudwatchLog
log4j.appender.cloudwatch.logStream={startupTimestamp}-{sequence}
log4j.appender.cloudwatch.batchDelay=2500
log4j.appender.cloudwatch.rotationMode=daily

Note that I create a default ConsoleAppender, and only attach the CloudWatchAppender to my program's package (com.example.log4j). You may prefer to send everything to CloudWatch, but if you do beware that the AWS SDK does its own logging; you won't want to use DEBUG level for it or the Apache HTTP client:

log4j.logger.org.apache.http=ERROR
log4j.logger.com.amazonaws=ERROR

Second thing to note is the logStream configuration parameter: it (and logGroup) can use substitution variables. Here I'm writing a new stream for each application run, rotated daily, with a sequence number to keep track of the different streams.

For more information, head over to the project on GitHub. Feel free to submit issues if you find problems or want an enhancement; I can't guarantee turnaround time for enhancements, but will try to get bugs fixed within a few days.

Next up: an appender for Kinesis Firehose in order to use Kibana with ElasticSearch.

Wednesday, August 9, 2017

Managing Secrets with KMS

Update, April 2018: Amazon just introduced AWS Secrets Manager, which allows you to securely store, retrieve, and version secrets with attached metadata. It also provides direct integration with RDS (MySQL, Postgres, and Aurora only), allowing you to rotate passwords for these services without a code update. Unless you really want to track your secrets in source control, this is a better solution than what's described in this post.


Managing secrets — database passwords, webservice logins, and the like — is one of the more painful parts of software development and deployment. You don't want them to appear in plaintext, because that's a security hole waiting to be exploited. Yet you want them to be stored in your source control system, so that you can track changes. In order to use the secrets you need to decrypt them, but storing a decryption key in source control is equivalent to storing the secrets themselves in plaintext.

I've seen several ways to solve this problem, and liked none of them. On one end of the spectrum are tools like BlackBox, which try to federate user-held keys so that those users can encrypt or decrypt a file of secrets. It was the primary form of secret-sharing at one company I worked at, and I found it quite brittle, with users quietly losing the ability to decrypt files (I suspect as a result of a bad merge).

On the other end of the spectrum is something like HashiCorp Vault, which is a service that provides encrypted secret storage as one of its many capabilities. But you have to manage the Vault server(s) yourself, and you still need to manage the secret that's used to authenticate yourself (or your application) with the service.

One alternative that I do like is Amazon's Key Management Service (KMS). With KMS, you create a “master key” that is stored in Amazon's data center and never leaves. Encryption and decryption are web services, with access controlled via the Amazon Identity and Access Management service. The big benefit of this service is that you can assign the decryption role to an EC2 instance or Lambda function, so that you never need to store physical credentials. The chief drawback is that service-based encryption is limited to 4k worth of data, and you'll pay for each request; for managing configuration secrets, this shouldn't be an issue.

In this post I'm going to show two examples of using KMS. The first is simple command-line encryption and decryption, useful for exchanging secrets between coworkers over an untrusted medium like email. The second shows how KMS can be used for application configuration. To follow along you'll need to create a key, which will cost you $1 for each month or fraction thereof that the key exists, plus a negligible amount per request.

Command-line Encryption and Decryption

Security professionals may suffer angina at the very idea of sharing passwords, but there are times when it's the easiest way to accomplish a task. However, the actual process of sharing is a challenge. The most secure way is to write the password on a piece of paper (with a felt-tip pen so that you don't leave an imprint on the sheet below), physically hand that sheet to the recipient, and expect her to burn it after use. But I can't read my own writing, much less expect others to do so. And physical sharing only works when the people are colocated, otherwise the delays become annoying and you have to trust the person carrying the message.

Sending plaintext passwords over your instant messaging service is a bad idea. I trust Slack as much as anybody, but it saves everything that you send, meaning that you're one data breach away from being forced to change all of your passwords. You could use GPG to encrypt the secret, but that requires that the recipient have your public key (mine is here, but do you have faith that I will have control over that server when “I” send you a message?).

If both of you have access to the same KMS master key, however, there is a simple solution:

> aws kms encrypt --key-id alias/example --output text --query CiphertextBlob --plaintext "Hello, world!"
AQICAHhJ6Eby+GBrQVV7F+CECJDvJ9pMoXIVzuATRXZH67SbpgEIMhJrjZwJwV7Ew9xD9dhqAAAAazBpBgkqhkiG9w0BBwagXDBaAgEAMFUGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMp2GUXayB8nDensi1AgEQgCi2kSz2LdSXHw9WOONhBwA+jadJaLL6QgwbdeNMbz3EF/xwRbqOJUV+

That string is a Base64-encoded blob of ciphertext that encrypts both the plaintext and the key identifier. You can paste it into an email or instant messenger window, and the person at the other end simply needs to decode the Base64 and decrypt it.

> echo "encrypted string goes here" | base64 -d > /tmp/cipherblob

> aws kms decrypt --ciphertext-blob fileb:///tmp/cipherblob
{
    "Plaintext": "SGVsbG8sIHdvcmxkIQ==",
    "KeyId": "arn:aws:kms:us-east-1:717623742438:key/dc46c8c3-2269-49ef-befd-b244c7f364af"
}

Well, that's almost correct. I showed the complete output to highlight that while encrypt and decrypt work with binary data, AWS uses JSON as its transport container. That means that the plaintext remains Base64-encoded. To actually decrypt, you would use the following command, which extracts the Base64-encoded plaintext from the response and pipes it through the Base64 decoder:

> aws kms decrypt --ciphertext-blob fileb:///tmp/cipherblob --output text --query Plaintext | base64 -d
Hello, world!

Secrets Management

The previous example assumed the user that was allowed to use the key for both encryption and decryption. But the ability to encrypt does not imply the ability to decrypt: you control access to the key using IAM policies, and can grant encryption to one set of users (developers) and decryption to another (your applications).

Let's start with the policy that controls encryption:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Encrypt"
            ],
            "Resource": [
                "arn:aws:kms:us-east-1:1234567890:key/dc46ccc3-2869-49ef-bead-b244c9f364af"
            ]
        }
    ]
}

For your own policy you would replace 1234567890 with your AWS account ID, and dc46… with the UUID of your own key. Note that you have to use a UUID rather than an alias (ie: alias/example from the command-line above); you can get this UUID from the AWS Console.

Attach this policy to the users (or better, groups) that are allowed to encrypt secrets (typically your entire developer group).

The decryption policy is almost identical; only the action has changed.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:kms:us-east-1:1234567890:key/dc46c8c3-2269-49ef-befd-b244c7f364af"
            ]
        }
    ]
}

However, rather than attaching this policy to a user or group, attach it to an EC2 instance role. Then, inside your application, use this code to do the decryption:

    private String decodeSecret(AWSKMS kmsClient, String secret) {
        byte[] encryptedBytes = BinaryUtils.fromBase64(secret);
        ByteBuffer encryptedBuffer = ByteBuffer.wrap(encryptedBytes);

        DecryptRequest request = new DecryptRequest().withCiphertextBlob(encryptedBuffer);
        DecryptResult response = kmsClient.decrypt(request);

        byte[] plaintextBytes = BinaryUtils.copyAllBytesFrom(response.getPlaintext());
        try {
            return new String(plaintextBytes, "UTF-8");
        }
        catch (UnsupportedEncodingException ex) {
            throw new RuntimeException("UTF-8 encoding not supported; JVM may be corrupted", ex);
        }
    }

Note that you pass in the KMS client object: like other AWS client objects, these are intended to be shared. If you use a dependency-injection framework, you can create a singleton instance and inject it where needed. As with any other AWS client, you should always use the client's default constructor (or, for newer AWS SDK releases, the default client-builder), which uses the default provider chain to find actual credentials.

This Doesn't Work!

There are a few things that can trip you up. Start debugging by verifying that you've assigned the correct permissions to the users/groups/roles that you think you have: open the user (or group, or role) in the AWS Console and click the “Access Advisor” tab. It's always worth clicking through to the policy, to ensure that you haven't accidentally assigned the encrypt policy to the decrypt user and vice-versa.

Amazon also provides a policy simulator that lets you verify explicit commands against a user/role/group. If use it, remember that you have to explicitly reference the ARN of the resource that you're testing (in this case, the key); by default the Policy Simulator uses a wildcard (“*”), which will be rejected by a well-written policy.

When making changes to policies, remember that AWS is a distributed system, so changes may take a short amount of time to propagate (the IAM FAQ uses the term “almost immediately” several times).

And lastly, be aware that KMS keys have their own policies, and that the key's own policy must grant access to the AWS account for any IAM roles to be valid. If you create your KMS key via the console it will have an appropriate default policy; this may not be the case if you create it via the SDK or command line (although the docs indicate that, even then, there's a default policy that grants access to the account, so this problem is unlikely).

But I don't want to be locked in to AWS!

I've heard several people raise this issue, about various AWS services. It's not an issue that I think much about: the last few companies that I've worked for were running 100% in AWS. We're already locked in, making use of multiple AWS services; KMS is just one more. If you're also running entirely in AWS, I think that you should embrace the services available, and not worry about lock-in; moving to another cloud provider isn't a task to be undertaken lightly even if you're just using compute services.

For those running hybrid deployments (part in the cloud part in a datacenter), or who use cloud services from multiple vendors, the concern is perhaps more relevant. If only because your operations become dependent on the network connection between you and AWS.

The cost-benefit analysis in that case becomes a little more complex. I still think that KMS — or a similar service provided by other clouds, like Azure Key Vault — is worthwhile, simply because it applies rigor to your secrets management. With a little thought to your configuration management, you should be able to keep running even if the cloud is unavailable.

Wednesday, August 2, 2017

What I Look For When Evaluating Coding Challenges

Like many companies, my employer uses a coding challenge as part of the interview process. You may or may not like coding challenges, or think they're fair or representative, but I think it's a useful tool as part of the interview process. This post explains why, and how I look at the challenge responses. Other companies and people may do things differently.

Our challenge is the third step in the interview process. The process begins with several developers reading the candidate's resume and voting. If a candidate receives a net positive vote, we proceed to the second stage, which is a conversation with our HR representative to go over the position. If the candidate considers him- or herself a fit, we send the coding challenge, at a date and time of the candidate's choosing.

The challenge is taken almost verbatim from one of the many coding interview books, a fact that one candidate pointed out to us with reference to the exact book and page — and followed that with an answer to a completely different question. We allow the candidate to take as much time as desired, with the suggestion that it should only require an hour; the implication is that if you're still struggling after several hours it's time to give up. Originally we had a hard one-hour time limit, but I found that candidates were making silly mistakes and pushed for the relaxed deadline in the hope that they'd take time for polishing. Unfortunately, candidates still make silly mistakes.

The question is relatively simple, but involves relationships between data structures; something that seems to give people trouble. It could be implemented using a six-line SQL query with a self-join, and if anyone ever submits that I will give him or her an immediate thumbs-up.

But, barring that, here are the things that I'm evaluating, in order of importance:

  • It's gotta run
    Seems self-evident, no? But it's amazing to me how many submissions have glaring bugs that indicate the candidate hasn't actually run the code. We used to give a pass if we could fix the bug (most of them were one-liners) and the algorithm was reasonable, but we stopped doing that: if you can't submit running code for your interview, why should we expect you to submit running code as part of your daily work?
  • It's gotta do what we asked
    The acceptance criteria are simple: “print X.” But it's amazing how many people do that only coincidentally. Again, we used to give a pass if you printed X along with Y and Z, but doing so raises the same issue as above: will you suddenly become less sloppy if we hire you? I'm betting no.
  • Use appropriate data structures
    This is a somewhat fuzzy criterion, but an example might help. If you use a List and then write deduplication logic, I'm going to wonder if you know what a Set is. That's not a show-stopper for an entry- or mid-level position, but it's a serious negative for a lead. That said, so far I haven't seen a submission that used inappropriate data structures and didn't have other significant problems.

That's it. I'm not looking for a particular answer or a particular style of coding; there are several equally valid approaches. If you chose one of them (or something completely different that passes our test cases) you'll be invited to the next stage of the interview process, the technical phone screen. And that brings up my final expectation:

Be prepared to discuss your work

One of my personal annoyances with coding challenges is when they appear to go into the bitbucket: you don't know whether the company even looked at it, or just assigned it as a way to weed out people who weren't committed enough to submit anything. So when I do a phone interview, I open up the code and ask the candidate questions about it (they've been told that this will happen, so have the opportunity to prepare — it's still surprising to me how many don't).

I present the discussion as a code review: I want to see how the candidate responds to critique. And there's almost always something to criticize: the one-hour suggested time limit usually leads to corners being cut (although I did have one candidate whose code was almost perfectly written; he did coding challenges as a hobby). I believe a question like “why did you iterate this structure twice?” can lead to useful insights about candidates; at the least, it shows whether they can look at their own code dispassionately.

Other topics of my phone interview include asking the candidate to evaluate the code him- or herself, asking the candidate to compare his or her approach with the other standard approach to the problem (it's interesting: most candidates “see” just one approach or the other), and finally, asking what tests would be appropriate for the code.

Does all of this lead to a better candidate? To be honest, I think our false-positive rate is still too high: we get people who pass the coding challenge but then fail the in-person interview (which has design questions and a “do I want to work with this person” focus).

But compared to my experience at a former company that let HR do all the screening, it's a lot better: I've never had the experience of sitting down in a face-to-face interview with someone who has a long resume but no competence.