Showing posts with label security. Show all posts
Showing posts with label security. Show all posts

Saturday, December 18, 2021

My take on the Log4J 2.x vulnerability

A week ago, a lot of Java application developers learned that their applications harbored a severe vulnerability, courtesy of the Log4J 2.x logging framework. The vulnerability, CVE-2021-44228 allowed execution of arbitrary code on any system that logged unsanitized user input using the Log4J 2.x library. This prompted a lot of people to scramble to protect their systems, and a lot of people to take to Twitter with opinions of What It All Means.

As the maintainer of an open-source library that integrates with Log4J, I spent my Saturday morning understanding the bug and updating my examples to use versions of the library that mitigated the problem. Fortunately, my library is not directly affected, as long as its consumers don't use unsanitized data to configure the logging framework.

Having done that, and reading as much as I could on the issue (the LunaSec writeups are excellent), I've decided to voice my own thoughts on the matter. Some of which I haven't seen other people say, so they may be similarly interesting to you.

First, I think that it's important to recognize that this vulnerability — like most things that end up in the news — is not the result of a single issue. Instead, it's a combination of features, some of which make perfect sense:

  1. Log4J 2.x provides string interpolation for configuration (aka “lookups”)

    This is a great idea; I implemented it for my own logging library. The idea is that you don't want your configuration files to contain hardcoded values, such as the address of your central syslog daemon (or, in my case, the name of a CloudWatch log group). Log4J 2.x provides a long list of lookups, ranging from environment variables to information from the Docker container running your application.

  2. One of the lookups retrieves data from the Java Naming and Directory Interface (JNDI) API

    In large deployments, it's nice to be able to centralize your configuration. There are lots of tools to do this, but Java Enterprise Edition settled on JNDI, also known as the javax.naming package. JNDI is an umbrella API for retrieving data from different sources, such as LDAP.

  3. javax.naming.spi.ObjectFactory supports loading code from a remote server

    This is a somewhat dubious idea, but the JNDI SPI (service provider interface) spec justifies it: it lets you defined resources (such as printer drivers) that can be retrieved directly from JNDI. They use the example of a printer driver, and it should be a familiar example to anyone who has installed a printer on their home computer (do you know that the driver you installed is trustworthy?).

    Note: this condition is necessary for remote-code execution, but not for data exfiltration.

  4. The Log4J 2.x PatternLayout also supports string interpolation, with the same set of lookups

    Here we're getting into questionable features in the library. I had no idea that this behavior existed, even though I dug deeply into the documentation and source code when implementing my appenders. The documentation for this layout class is quite long, and prior to the vulnerability, you had to infer the behavior based on the presence of the nolookups pattern option.

  5. Users log unsanitized data

    If users passed all of their logging messages through a sanitizer that looked for and removed the sequence ${, then the vulnerability wouldn't exist. Except nobody does that, because why would you have to? I call it out because I think it's important to consider what you're logging, and whether it might contain information that you don't want to log. As I say in my “effective logging” talk, passwords can hide in the darndest places.

These are the things that need to happen to make this vulnerability affect you. If you can prevent one of them from happening, you're not vulnerable … to this specific issue. And there are ways to do this, although they aren't things that you can easily implement once the vulnerability is discovered. I'm writing a separate post on how Cloud deployments can mitigate similar vulnerabilities.

For now, however, I want to focus on two of the responses that I saw online: pay for open-source maintainers, and inspecting code before using it. In my opinion, neither of these are valid, and they distract from preparing for the next serious vulnerability.

Response #1: inspect code before you use it.

For any non-trivial application, this is simply impossible.

The current “trunk” revision of log4j-core has 89,778 lines of code, excluding test classes (measured using find and wc). That doesn't count any add-on libraries that you might use to write to your preferred destination, such as my log4j-aws-appenders (which has over 10,000 lines of mainline code). And logging is a tiny part of a modern application, which is typically built using a framework such as Spring, and runs on an application server such as Tomcat or Jetty.

And even if your company is willing to pay for the staff to read all of that source code, what is the chance that they will discover a vulnerability? What is the chance that they will even learn all of its behaviors? I've spent quite a bit of time with the Log4J 2.x code and documentation, both to write my library and on a post describing how to write Log4J plugins, and I never realized that it applied string interpolation to the raw log messages. After I knew about the vulnerability, of course, it was easy to find the code responsible.

Response #2: we — or at least large companies — should be paying the maintainers of open-source projects.

I don't know the situation of the core Log4J developers, and whether or not they get paid for their work on the project. But I don't believe that this vulnerability was the result of an overworked, unpaid, solo developer. True, there was no discussion on the commit that introduced JNDI lookups (ignoring the tacky post-hoc commenters). But the code that applies string substitution to logging events has been in the library since the 2.0 release, and has been rewritten multiple times since then.

In fact, I think direct corporate sponsorship would lead to more unexpected behavior, because the sponsoring corporations will all have their own desires, and an expectation that those desires will be met. And if my experience in the corporate world is any guide, a developer that feels their paycheck is in jeopardy is much more likely to do something without giving it full thought.

So where does that leave us?

Unfortunately, I haven't seen very many practical responses (although, as I said, I'm writing a separate post to this effect). And I think that the reason for that is that the real answer puts the onus for vulnerabilities on us, the consumers of open-source software.

Linus's Law, coined by Eric S Raymond, is that “given enough eyeballs, all bugs are shallow.” And this played out in full view with this vulnerability: the Log4J team quickly found and patched the problem, and has been releasing additional patches since.² There have also been multiple third-parties that have written detailed evaluations of the vulnerability.³

But still, we, the consumers of this package, needed to update our builds to use the latest version. And then deploy those updated builds. If you didn't already have a process that allows you to quickly build and deploy a hotfix, then your weekend was shot. Even if you did get a hotfix out, you needed to spend time evaluating your systems to ensure that they weren't compromised (and beware that the most likely compromise was exfiltration of secrets!).

It's natural, in such circumstances, to feel resentment, and to look for external change to make such problems go away. Or maybe even to think about reducing your reliance on open-source projects.

But in my opinion, this is the wrong outcome. Instead, look at this event as a wake-up call to make your systems more robust. Be prepared to do a hotfix at a moment's notice. Utilize tools such as web application firewalls, which can be quickly updated to block malicious traffic. And improve your forensic logging, so that you can identify the effects of vulnerabilities after they appear (just don't log unsanitized input!).

Because this is not the last vulnerability that you will see.


1: You might be interested to learn that the Apache Software Foundation receives approximately $3 million a year (annual reports here). However, to the best of my knowledge, they do not pay stipends to core maintainers. I do not know how the core Log4J 2.x maintainers earn their living.

2: Log4J 2.x change report.

3: I particularly like the CloudFlare posts, especially the one that describes how attackers changed their payloads to avoid simple firewall rules. This post informed my belief that the goal of most attacks was to exfiltrate secrets rather than take over systems.

Sunday, December 16, 2018

Database Connection Pools Need To Evolve

I never thought very deeply about connection pools, other than as a good example of how to use phantom references. Most of the projects that I worked on had already picked a pool, or defaulted their choice to whatever framework they were using. It is, after all, a rather mundane component, usually hidden behind the object relational manager that you use to interact with your database.

Then I answered this Stack Overflow question. And, after playing around a bit and writing a much longer answer, I realized that connection pools — even the newest — have been completely left behind by the current best practices in database management. Two of those practices, in particular:

  • Database credentials should be handled with care

    All of the pools that I've used expect you to provide database credentials in their configuration. Which means that you, as a security-conscious developer, need to retrieve those credentials from somewhere and manually configure the pool. Or, as a not-so-security-conscious developer, store them in plain text in a configuration file. In either case, you're doing this once, when the application starts. If there's ever a need to change credentials, you restart your application.

    That makes it difficult to practice credential rotation, where your database credentials change on a frequent basis to minimize the impact of losing those credentials. At the extreme, Amazon's RDS databases support generation of credentials that last only 15 minutes. But even if you rotate credentials on a monthly basis, the need to restart all of your applications turns this simple practice into a major undertaking, almost certainly manual, and one that may require application downtime.

  • Failover isn't a big deal

    Failover from primary database to replica has traditionally been a Big Event, performed manually, and often involving several people on a conference call. At a minimum you need to bring up a new replaca, and with asynchronous, log-based replication there is always the chance of lost data. But with modern cluster-based database servers like Amazon Aurora, failover might happen during unattended maintenance. If the application can't recognize that it's in the middle of a short-duration failover, that's means it's still a Big Deal.

One solution to both of these problems is for the pool to provide hooks into its behavior: points where it calls out to user code to get the information it needs. For example, rather than read a static configuration variable for username and password, it would call a user-defined function for these values.

And that's fine, but it made me realize something: the real problem is that current connection pools try to do two things. The first is managing the pool: creating connections, holding them until needed, handing them out, collecting them again when the application code forgets to close them, and trying to do all of this with the minimal amount of overhead. The second task is establishing a connection, which has subtle variations depending on the database in use.

I think that the next evolution in connection pools is to separate those behaviors, and turn the connection management code into a composable stack of behaviors that can be mixed and matched as needed. One person might need a MySQL connection factory that uses an IAM credentials provider and a post-connection check that throws if the database is read-only. Another person might want a Postgres connection factory that uses an environment-based credentials provider and is fine with a basic connection check.

The tricky part is deciding on the right interfaces. I don't know that the three-part stack that I just described is the right way to go. I'm pretty sure that overriding javax.sql.DataSource isn't. And after creating an API, there's the hard work of implementing it, although I think that you'll find a lot of people willing to contribute small pieces.

The real question: is there a connection pool maintainer out there who thinks the same way and is willing to break her pool into pieces?

Friday, January 27, 2017

Trusting the Internet: Picking Third-Party Libraries

Many applications today are like the human body:* a relatively small proportion of “in-house” code, leveraged by dozens if not hundreds of third-party libraries — everything from object-relational mappers to a single function that left-pads a string. And that leads to a conundrum: how do you pick the libraries that you include in your project? Or in other words, is it OK to download something from the Internet and make it a fundamental part of your business?

Sometimes, of course, you don't have a choice. If you use the JUnit testing framework, for example, you are going to get the Hamcrest library along with it (and maybe you'll feel some concern that the hamcrest.org domain is no longer registered). But what criteria did you use to pick JUnit?

I was recently faced with that very question, in looking for a library to parse and validate JSON Web Tokens in Java. There are several libraries to choose from; these are the criteria that I used to pick one, from most important to least.

  1. Following the crowd
    If 100,000 projects use a particular library without issue, chances are good that you can too. But how do you know how many projects use a library? For JavaScript projects, npm gives you numbers of downloads; ditto for Ruby projects and the gems they use. Java projects don't have it so easy: while Maven Central does keep statistics on downloads (they're available to package maintainers), that information isn't available to consumers (other than a listing of the top 10 downloads).

    One interesting technique for following the crowd is looking in your local repository, to see if the package is already there as a dependency of another library. If you can create a dependency tree for your project, look at where the candidate lives in the tree: is it close to the root or deep in the weeds? Is it included by multiple other libraries or just one? These are all signals of how the rest of the world views the library.

  2. Documentation
    I believe that care in documentation is a good proxy for care in implementation. Things I want to see are complete JavaDoc and examples. For projects hosted on GitHub, I should be able to understand the library based solely on the README (and there's no reason that a non-GitHub project should omit a README, although I admit to being guilty in that regard).

  3. Author Credibility
    This can be difficult, especially where the “author” is a corporation (although large corporations tend to do their own vetting before letting projects out under the corporate name). In the case of a sole maintainer, I Google the person's name and see what comes up. I'd like to see web pages that demonstrate deep knowledge of the subject (especially for security-related libraries). Even better are slides from a conference, because that implies tha the author has at least some recognition in the community.

  4. Issue Handling
    Every library has issues. Does the maintainer respond to them in a reasonable timeframe? A large number of outstanding issues should raise a red flag, as should a maintainer that responds in a non-professional manner. You wouldn't accept that from a coworker (I hope), and by using a package you make the maintainer your coworker.

Once I have decided on a candidate library (or small number of candidates) I try it out for my use case. If it looks good, it becomes part of my application. One thing that I do not do is dig into the library source code.

The promise of open-source software is that you can download the sources and inspect them. The reality is that nobody every does that — and nobody could, because it would be more than a full-time job. So we choose as best we can, and hope that there isn't a dependency-of-a-dependency-of-a-dependency that's going to hurt us.


* The reference is to the amount of bacteria and other organisms that don't share your DNA but live on or in your body. You'll often find a 10:1 (bacteria:human) ratio quoted, but see this article for commentary on the history and validity of that ratio (tl;dr, it's more like 60:40).

Monday, November 23, 2015

Java Object Serialization and Untrusted Code Execution

This past week we had a mini-fire-drill at work, in response to a CERT vulnerability note titled “Apache Commons Collections Java library insecurely deserializes data.” We dutifully updated our commons-collections dependencies, but something didn't quite smell right to me, so I did some digging. And after a few hours of research and experimentation, I realized that commons-collection was only a convenient tool to exploit the real weakness.

Yes, if you use commons-collection, you should update it to the latest version. Today. Now.

But more important, you need to review your codebase for any code that might deserialize untrusted data. This includes references to ObjectInputStream, but it also includes any RMI servers that you expose to the outside world. And don't stop at your code, but also look at your dependencies to make sure that they don't deserialize untrusted data. Once you've done that, read on for an explanation.

Actually, there are two things you should read first. The first is the slide deck from “Marshalling Pickles,” which describes how this exploit works (for multiple languages, not just Java). The second is my article on Java serialization, especially the section on readObject().

Now that you've read all that, I'm going to summarize the exploit:

  • The Java serialization protocol is public and easy to read. Anybody can craft an arbitrary object graph and write it to a file.
  • To deserialize an object, that object's classfile must be on the classpath. This means that an attacker can't simply write a malicious class, compile it, and expect your system to execute it (unless they can also get the bytecode into your system, in which case you've already lost). This, however, is a false sense of security.
  • Apache Commons Collections provides a number of functor objects that perform invocation of Java methods using reflection. This means that, if Commons-Collections is on your classpath, the attacker can execute arbitrary code, by providing it as serialized data. And because it's so useful, Commons-Collections is on many classpaths, including many app-servers.
  • Commons-Collection also provides a LazyMap class that invokes user-defined functors via its get(). This means that, if you can get the target JVM to accept and access such a map, you can execute arbitrary code.
  • The JRE internal class sun.reflect.annotation.AnnotationInvocationHandler has a member variable as a Map (rather than, say, a HashMap). Moreover, it has a readObject() method that accesses this map as part of restoring the object's state in its readObject() method.

Putting all these pieces together, the exploit is a serialized AnnotationInvocationHandler that contains a LazyMap that invokes functors that execute arbitrary code. When this graph is deserialized, the functors are executed, and you're pwned.

There are a few points to take away from this description. The first, of course, is that uncontrolled deserialization is a Bad Idea. That doesn't mean that serialization is itself a bad idea; feel free to serialize and deserialize your internal data. Just be sure that you're not deserializing something that came from outside your application.

The second take-away is that code-as-data can be dangerous. As I said above, there's no way to put arbitrary code on your classpath, but the exploit doesn't rely on that: it relies on passing arbitrary data into your application. Clojure fans might not like the implications.

For that matter, pay attention to what goes onto your classpath! Apache Commons is an incredibly useful collection of utility classes; if you don't use them, it's likely that one of your dependencies does. Or you have a dependency that offers features that are just waiting to be exploited. My KDGCommons library, for example, has a DefaultMap; all it's missing is a reflection-based functor (and a much wider usage).

The final — and most important — take-away is to pay attention to the variables in your serialized classes. The attack vector here is not commons-collections, it's AnnotationInvocationHandler and its Map. If, instead, this member was defined as a concrete class such as HashMap, the exploit would not have worked: attempting to deserialize a LazyMap when a Hashmap is expected would cause an exception. I know, using a concrete class goes against everything that we're taught about choosing the least-specific type for a variable, but security takes precedence over purity (and you don't need to expose the concrete type via the object's API).

If you're not scared enough already, I'll leave you with this: in the src.zip for JDK 1.7.0_79, there are a little over 600 classes that implement Serializable. And there are a bunch of these that define a private Map. Any of those could be a vector for the same attack.

Friday, July 24, 2015

Have I Been Hacked?

Twenty-five years later I can still remember how I felt, returning home that day. Being burgled is a surrealistic experience: you notice a progression of little things that don't belong, and it may be quite a long time before your realize that your world has changed. For me, the first anomaly was that the front door of our two-family house was unlocked. OK, maybe my neighbor was out in the back yard. The second was that I could hear the music that I left on for the bird. As I walked up the stairway (unchanged), I saw that my front door was open. I didn't notice that the wood around the lock was splintered. I walked into what seemed a perfectly normal front room, and finally realized that something was wrong when I saw the VCR hanging by its antenna cable (+1 for tightening the cable with a wrench).

Now imagine a different scenario: the front door locked, the upper door standing open but unlocked, lights on that shouldn't be, and a few dirty dishes on the counter. All things that could be explained by me being particularly forgetful that morning. But still the sense that something was not quite right.

That was my feeling this week, as several of my friends reported receiving spam emails from me, warning me that my Yahoo account might have been hacked. The first was from my neighbor, and I discounted his report, thinking that the spammer might have hit another neighbor, gotten the neighborhood mailing list, and paired up random people. After all, the “From” address wasn't even mine! But then I got reports from friends that weren't neighbors, and even found a couple of the emails in my own GMail spam folder.

OK, so my Yahoo account got hacked, big deal. Maybe next time I'll be more careful with sharing passwords.

Except … the Yahoo account has its own password, one that's not saved anywhere but my head, so I have a hard time accepting that the account was actually broken into. And Yahoo's “activity report” claims that all logins in the past 30 days came from my home IP (a nice feature, it's one of the tabs on the profile page). And, I can still log into the account. I've never heard of anyone breaking into an account and just leaving it there, untouched.

And when I looked at the message, my email address wasn't to be found anywhere. It was “kdgregory,” but some server in a .cx domain. Different people reported different domains. Nor was a Yahoo server to be found in the headers. OK, headers can be forged, but I would have expected a forgery that at least attempted to look credible. According to the IP addresses in the headers, this email originated somewhere in India, went through a Japanese server, and then to its destination.

So I'm left with wondering what happened. Clearly these emails were based on information from my account, either headers from old messages (likely) or a contact list (less likely). But how? Googling for “yahoo data breach” turns up quite a few news stories, but nothing from this year. Did whoever acquired these addresses just sit on them for a year? And if yes, what other information did they glean from my account?

It's disquieting, this sense of perhaps being compromised. I think I would have been happier if they had changed my password and locked me out of the account. At least then I'd know I was being sloppy about security. As it is, I have no idea what (if anything) I did wrong. Or whether it will happen again.

Monday, August 17, 2009

Designing a Wishlist Service: Security

Security is the red-headed stepchild of the web-service world. Sure, there's WS-Security, which may be useful if you're using SOAP in server-server communication. But most services, in particular, browser-accessed services, rely on the security mechanisms provided by HTTP and HTTPS. If you want authentication, you use a 401 response and expect the requester to provide credentials in subsequent requests. If you're paranoid, you use a client certificate.

The wishlist service doesn't quite fit into this model. There's a clear need for some form of authorization and control, if only to protect the service from script kiddies with too much time on their hands. Yet authentication has to be transparent for usability: we don't want to pop up login dialogs, particularly if the customer has already logged in to their eCommerce account.

Perhaps the biggest constraint on any security mechanism is that requests will come from a browser. This means that the service can't expect the requester to provide any secret information, because anybody can view the page or script (I'm amazed by just how many web services require a “secret key” in their URL). If a page needs to provide secret information, that information must be encrypted before it is delivered to the user.

And secret information is required, because there are different levels of authorization: some people can update a list, others can only read it. The solution that I chose is to encrypt the wishlist key and user's authorization level, and pass that to the service as a URL parameter. This does represent another departure from pure REST, in that there may be multiple URLs that refer to the same underlying resource, but it seems the most reasonable compromise.

It also has the limitation that anybody who has a URL has the access defined by that URL. In some web services, this would be a problem. In the case of the wishlist service, it's mitigated by several factors.

First among these is that the value of this data just isn't that high. Sure, if you plan to be nominated for the Supreme Court, you probably don't want to keep a wishlist of X-rated videos. But for most people, a wishlist of clothing just isn't that important (and an opt-in warning can scare away those who feel differently). A related factor is that, in normal usage, people won't be handing out these URLs — at least, not URLs that can make changes. There's simply no reason to do so.

A second mitigating factor is that each URL also has an encrypted timeout: the service will reject any requests that arrive after that timeout. While this can be used to exert control over shared lists, it is primarily intended to defeat script kiddies who might try a denial-of-service attack via constant updates.

A third mitigating factor is that we keep close control over secrets, particularly those that can be used to break the encryption — in other words, security through obscurity. Bletchely Park would not have been nearly so successful at breaking Enigma if they hadn't learned to recognize radio operators. Taking heed of that, none of the information used by the encrypted parameter is provided in plaintext, meaning that any attempts at discovering the key will require heuristic analysis rather than a simple String.contains().