This past week we had a mini-fire-drill at work, in response to a CERT vulnerability note titled “Apache Commons Collections Java library insecurely deserializes data.” We dutifully updated our commons-collections dependencies, but something didn't quite smell right to me, so I did some digging. And after a few hours of research and experimentation, I realized that commons-collection was only a convenient tool to exploit the real weakness.
Yes, if you use commons-collection, you should update it to the latest version. Today. Now.
But more important, you need to review your codebase for any code that might deserialize untrusted data. This includes references to ObjectInputStream
, but it also includes any RMI servers that you expose to the outside world. And don't stop at your code, but also look at your dependencies to make sure that they don't deserialize untrusted data. Once you've done that, read on for an explanation.
Actually, there are two things you should read first. The first is the slide deck from “Marshalling Pickles,” which describes how this exploit works (for multiple languages, not just Java). The second is my article on Java serialization, especially the section on readObject()
.
Now that you've read all that, I'm going to summarize the exploit:
- The Java serialization protocol is public and easy to read. Anybody can craft an arbitrary object graph and write it to a file.
- To deserialize an object, that object's classfile must be on the classpath. This means that an attacker can't simply write a malicious class, compile it, and expect your system to execute it (unless they can also get the bytecode into your system, in which case you've already lost). This, however, is a false sense of security.
- Apache Commons Collections provides a number of functor objects that perform invocation of Java methods using reflection. This means that, if Commons-Collections is on your classpath, the attacker can execute arbitrary code, by providing it as serialized data. And because it's so useful, Commons-Collections is on many classpaths, including many app-servers.
- Commons-Collection also provides a
LazyMap
class that invokes user-defined functors via itsget()
. This means that, if you can get the target JVM to accept and access such a map, you can execute arbitrary code. - The JRE internal class
sun.reflect.annotation.AnnotationInvocationHandler
has a member variable as aMap
(rather than, say, aHashMap
). Moreover, it has areadObject()
method that accesses this map as part of restoring the object's state in itsreadObject()
method.
Putting all these pieces together, the exploit is a serialized AnnotationInvocationHandler
that contains a LazyMap
that invokes functors that execute arbitrary code. When this graph is deserialized, the functors are executed, and you're pwned.
There are a few points to take away from this description. The first, of course, is that uncontrolled deserialization is a Bad Idea. That doesn't mean that serialization is itself a bad idea; feel free to serialize and deserialize your internal data. Just be sure that you're not deserializing something that came from outside your application.
The second take-away is that code-as-data can be dangerous. As I said above, there's no way to put arbitrary code on your classpath, but the exploit doesn't rely on that: it relies on passing arbitrary data into your application. Clojure fans might not like the implications.
For that matter, pay attention to what goes onto your classpath! Apache Commons is an incredibly useful collection of utility classes; if you don't use them, it's likely that one of your dependencies does. Or you have a dependency that offers features that are just waiting to be exploited. My KDGCommons library, for example, has a DefaultMap
; all it's missing is a reflection-based functor (and a much wider usage).
The final — and most important — take-away is to pay attention to the variables in your serialized classes. The attack vector here is not commons-collections, it's AnnotationInvocationHandler
and its Map
. If, instead, this member was defined as a concrete class such as HashMap
, the exploit would not have worked: attempting to deserialize a LazyMap
when a Hashmap
is expected would cause an exception. I know, using a concrete class goes against everything that we're taught about choosing the least-specific type for a variable, but security takes precedence over purity (and you don't need to expose the concrete type via the object's API).
If you're not scared enough already, I'll leave you with this: in the src.zip
for JDK 1.7.0_79, there are a little over 600 classes that implement Serializable
. And there are a bunch of these that define a private Map
. Any of those could be a vector for the same attack.