Thursday, December 27, 2018

log4j-aws-appenders now supports Logback

I started my Log4J appenders project because I wasn't happy with how the AWS CloudWatch agent broke apart logfiles. It seemed, as I said in the FAQ, like it would be an easy weekend project. That was nearly two years ago.

In the interim, I added support for Kinesis Streams (a year ago) to support search-engine-based centralized logging using AWS managed Elasticsearch. Surprisingly, it wasn't until after that implementation effort that I truly “bought into” the benefits of using a search engine to examine logs. Now I can't imagine going back to grep.

After giving a talk on centralized logging to the local Java users' group, some of the feedback that I got was “it's nice, but we're not using Log4J 1.x.” So in the early fall I started to break the library into pieces: a front-end that's tied to a particular logging framework, and a back-end that handles communication with AWS. This turned out to be quite easy, which I think means that I had a good design to start with.

Then it was a matter of picking another logging framework, and learning enough about it to be able to implement appenders. I picked Logback because it's the default logging framework for Spring, and because it's the native back-end for SLF4J (which I've been using with Log4J for around five years now).

One of the interesting things that came out of this work is that I now see a good use case for multiple inheritance. There's an enormous amount of duplicated code because each appender has two is-a relationships: one to the logging framework and another to the back end. It would be nice if Java had something like Scala traits, where each trait would encapsulate one of the is-a relationships, and the appender would just be a combination of traits. On the other hand, I've seen enough ugly code using traits that I still think Gosling et al made the right decision.

Log4J 2.x is up next, although I think I'm going to take a break for a few months. I have several other projects that have taken a back seat, including a library of AWS utility classes that I started a year ago and hasn't seen its first release.

Happy Holidays!

Sunday, December 16, 2018

Database Connection Pools Need To Evolve

I never thought very deeply about connection pools, other than as a good example of how to use phantom references. Most of the projects that I worked on had already picked a pool, or defaulted their choice to whatever framework they were using. It is, after all, a rather mundane component, usually hidden behind the object relational manager that you use to interact with your database.

Then I answered this Stack Overflow question. And, after playing around a bit and writing a much longer answer, I realized that connection pools — even the newest — have been completely left behind by the current best practices in database management. Two of those practices, in particular:

  • Database credentials should be handled with care

    All of the pools that I've used expect you to provide database credentials in their configuration. Which means that you, as a security-conscious developer, need to retrieve those credentials from somewhere and manually configure the pool. Or, as a not-so-security-conscious developer, store them in plain text in a configuration file. In either case, you're doing this once, when the application starts. If there's ever a need to change credentials, you restart your application.

    That makes it difficult to practice credential rotation, where your database credentials change on a frequent basis to minimize the impact of losing those credentials. At the extreme, Amazon's RDS databases support generation of credentials that last only 15 minutes. But even if you rotate credentials on a monthly basis, the need to restart all of your applications turns this simple practice into a major undertaking, almost certainly manual, and one that may require application downtime.

  • Failover isn't a big deal

    Failover from primary database to replica has traditionally been a Big Event, performed manually, and often involving several people on a conference call. At a minimum you need to bring up a new replaca, and with asynchronous, log-based replication there is always the chance of lost data. But with modern cluster-based database servers like Amazon Aurora, failover might happen during unattended maintenance. If the application can't recognize that it's in the middle of a short-duration failover, that's means it's still a Big Deal.

One solution to both of these problems is for the pool to provide hooks into its behavior: points where it calls out to user code to get the information it needs. For example, rather than read a static configuration variable for username and password, it would call a user-defined function for these values.

And that's fine, but it made me realize something: the real problem is that current connection pools try to do two things. The first is managing the pool: creating connections, holding them until needed, handing them out, collecting them again when the application code forgets to close them, and trying to do all of this with the minimal amount of overhead. The second task is establishing a connection, which has subtle variations depending on the database in use.

I think that the next evolution in connection pools is to separate those behaviors, and turn the connection management code into a composable stack of behaviors that can be mixed and matched as needed. One person might need a MySQL connection factory that uses an IAM credentials provider and a post-connection check that throws if the database is read-only. Another person might want a Postgres connection factory that uses an environment-based credentials provider and is fine with a basic connection check.

The tricky part is deciding on the right interfaces. I don't know that the three-part stack that I just described is the right way to go. I'm pretty sure that overriding javax.sql.DataSource isn't. And after creating an API, there's the hard work of implementing it, although I think that you'll find a lot of people willing to contribute small pieces.

The real question: is there a connection pool maintainer out there who thinks the same way and is willing to break her pool into pieces?

Saturday, December 8, 2018

Using CloudWatch Insights With JSON Log Messages

I've never been a fan of CloudWatch Logs. On the positive side, it's available by default, is relatively cheap, and there are a variety of ways to send data to it (including my own log4j-aws-appenders). But the features it supported were extremely limited: you could view log messages by date/time range or do a simple text search. As a result, I felt that “serious” log analysis required a tool like Kibana, fed by a Kinesis-based pipeline. Or, if your volumes were low enough, a service provider such as Loggly or DataDog.

That opinion changed with the introduction of CloudWatch Logs Insights. This feature adds a query tab to the CloudWatch Logs console, allowing you to filter and summarize messages. For example, this query will show you the number of messages that contain the word “ERROR” (from your selected time range, which is specified outside of the query):

filter @message like /ERROR/
| stats count()

To actually see those errors, you'd replace the stats directive with fields, along with a sort to put them in reverse chronological order:

filter @message like /ERROR/
| fields @@timestamp, @message
| sort @timestamp desc

That's nice, but it's still full-text search: if you're using Log4J it will pick up the messages that were logged at the ERROR level, but it will also pick up any other entries that happen to contain the word “ERROR” anywhere in the message.

But Insights isn't limited to simple text searches on the entire message: it will parse JSON messages and let you reference individual fields from those messages. And, as it happens, my appenders library provides a JSON layout class (originally intended for the Kinesis to Kibana pipeline linked above). To configure the CloudWatch appender with the JSON layout, you'd add these lines to your log4j.properties file (changing logGroup, logStream, and tags as appropriate):

log4j.appender.cloudwatch=com.kdgregory.log4j.aws.CloudWatchAppender
log4j.appender.cloudwatch.logGroup=AppenderExample
log4j.appender.cloudwatch.logStream={date}-{hostname}-{pid}
log4j.appender.cloudwatch.layout=com.kdgregory.log4j.aws.JsonLayout
log4j.appender.cloudwatch.layout.tags=applicationName=Example,runDate={date}
log4j.appender.cloudwatch.layout.enableHostname=true
log4j.appender.cloudwatch.layout.enableLocation=true

As I said, you can use individual fields from the JSON message in your queries. So to see only the Log4J errors, you would use the following (note: this uses the parsed “message” field, which is different from the unparsed “@message” field):

filter level == "ERROR"
| fields @timestamp, logger, message
| sort @timestamp desc
| limit 25

You can also aggregate the data. For example, this query counts the number of errors for each 15 minute period within your time range:

filter level == "ERROR"
| stats count() by bin(15m)

There are, of course, some caveats to this feature. While the documentation doesn't go into detail, it appears to work by a brute-force scan all of your log messages (unlike Kibana, which uses a search engine to index the messages). This means that queries take longer to run, and that time increases with the amount of data that you're examining.

You'll also pay (as of the time of writing) $0.005 per gigabyte of data scanned. This is cheap if you are careful to restrict your time periods and/or don't have a lot of logging output. But I suspect it will add up quickly.

With those caveats, however, I think Insights is a great solution for organizations that don't have large logging volumes, don't want to ship their logs offsite, and don't want to pay for an Elasticsearch/Kibana cluster.