blog.kdgregory.com: jvm

Showing posts with label jvm. Show all posts

Monday, November 23, 2015

Java Object Serialization and Untrusted Code Execution

This past week we had a mini-fire-drill at work, in response to a CERT vulnerability note titled “Apache Commons Collections Java library insecurely deserializes data.” We dutifully updated our commons-collections dependencies, but something didn't quite smell right to me, so I did some digging. And after a few hours of research and experimentation, I realized that commons-collection was only a convenient tool to exploit the real weakness.

Yes, if you use commons-collection, you should update it to the latest version. Today. Now.

But more important, you need to review your codebase for any code that might deserialize untrusted data. This includes references to ObjectInputStream, but it also includes any RMI servers that you expose to the outside world. And don't stop at your code, but also look at your dependencies to make sure that they don't deserialize untrusted data. Once you've done that, read on for an explanation.

Actually, there are two things you should read first. The first is the slide deck from “Marshalling Pickles,” which describes how this exploit works (for multiple languages, not just Java). The second is my article on Java serialization, especially the section on readObject().

Now that you've read all that, I'm going to summarize the exploit:

The Java serialization protocol is public and easy to read. Anybody can craft an arbitrary object graph and write it to a file.
To deserialize an object, that object's classfile must be on the classpath. This means that an attacker can't simply write a malicious class, compile it, and expect your system to execute it (unless they can also get the bytecode into your system, in which case you've already lost). This, however, is a false sense of security.
Apache Commons Collections provides a number of functor objects that perform invocation of Java methods using reflection. This means that, if Commons-Collections is on your classpath, the attacker can execute arbitrary code, by providing it as serialized data. And because it's so useful, Commons-Collections is on many classpaths, including many app-servers.
Commons-Collection also provides a LazyMap class that invokes user-defined functors via its get(). This means that, if you can get the target JVM to accept and access such a map, you can execute arbitrary code.
The JRE internal class sun.reflect.annotation.AnnotationInvocationHandler has a member variable as a Map (rather than, say, a HashMap). Moreover, it has a readObject() method that accesses this map as part of restoring the object's state in its readObject() method.

Putting all these pieces together, the exploit is a serialized AnnotationInvocationHandler that contains a LazyMap that invokes functors that execute arbitrary code. When this graph is deserialized, the functors are executed, and you're pwned.

There are a few points to take away from this description. The first, of course, is that uncontrolled deserialization is a Bad Idea. That doesn't mean that serialization is itself a bad idea; feel free to serialize and deserialize your internal data. Just be sure that you're not deserializing something that came from outside your application.

The second take-away is that code-as-data can be dangerous. As I said above, there's no way to put arbitrary code on your classpath, but the exploit doesn't rely on that: it relies on passing arbitrary data into your application. Clojure fans might not like the implications.

For that matter, pay attention to what goes onto your classpath! Apache Commons is an incredibly useful collection of utility classes; if you don't use them, it's likely that one of your dependencies does. Or you have a dependency that offers features that are just waiting to be exploited. My KDGCommons library, for example, has a DefaultMap; all it's missing is a reflection-based functor (and a much wider usage).

The final — and most important — take-away is to pay attention to the variables in your serialized classes. The attack vector here is not commons-collections, it's AnnotationInvocationHandler and its Map. If, instead, this member was defined as a concrete class such as HashMap, the exploit would not have worked: attempting to deserialize a LazyMap when a Hashmap is expected would cause an exception. I know, using a concrete class goes against everything that we're taught about choosing the least-specific type for a variable, but security takes precedence over purity (and you don't need to expose the concrete type via the object's API).

If you're not scared enough already, I'll leave you with this: in the src.zip for JDK 1.7.0_79, there are a little over 600 classes that implement Serializable. And there are a bunch of these that define a private Map. Any of those could be a vector for the same attack.

Saturday, July 12, 2014

Hotspot In Action

Yesterday I was writing some code that used the output of System.currentTimeMillis() as a database key. Yeah, yeah, that's a really bad idea, I know. But I figured that it was going to be a low-volume operation, and I added retry logic, and … I started to wonder about the resolution of the system clock, as returned by System.currentTimeMillis().

Returns the current time in milliseconds. Note that while the unit of time of the return value is a millisecond, the granularity of the value depends on the underlying operating system and may be larger. For example, many operating systems measure time in units of tens of milliseconds.

So, having some time to fill during a build, I wrote a program to determine the granularity of my operating system(s).

public static void main(String[] argv)
throws Exception
{
    ConcurrentHashMap<Long,AtomicLong> counters = new ConcurrentHashMap<Long,AtomicLong>();
    long finish = System.currentTimeMillis() + 1000;
    long current = 0;
    while ((current = System.currentTimeMillis()) < finish)
    {
        incrementCounter(counters, current);
    }

    System.out.println("number of counters = " + counters.size());
    TreeMap<Long,AtomicLong> displayMap = new TreeMap<Long,AtomicLong>(counters);
    for (Map.Entry<Long,AtomicLong> entry : displayMap.entrySet())
    {
        System.out.println(entry.getKey() + " = " + entry.getValue().longValue());
    }
}

private static void incrementCounter(ConcurrentHashMap<Long,AtomicLong> counters, long current)
{
    Long key = new Long(current);
    AtomicLong counter = counters.get(key);
    if (counter == null)
    {
        counters.putIfAbsent(key, new AtomicLong());
        counter = counters.get(key);
    }
    counter.incrementAndGet();
}

I'm not sure why I decided to use a Map, rather than simply storing the distinct timestamps in a Set (and yes, rather than roll my own counter map, I should have used an existing implementation). But the map gave me some interesting results:

number of counters = 1000
1405105663691 = 805
1405105663692 = 775
1405105663693 = 808
1405105663694 = 796
1405105663695 = 799
1405105663696 = 789
1405105663697 = 821
1405105663698 = 820
1405105663699 = 818
1405105663700 = 792
1405105663701 = 788
1405105663702 = 821
1405105663703 = 740
1405105663704 = 1103
1405105663705 = 1246
1405105663706 = 1921
1405105663707 = 2959
1405105663708 = 5960
1405105663709 = 6242
1405105663710 = 12867
1405105663711 = 13331
1405105663712 = 12883
1405105663713 = 13436
1405105663714 = 13457
1405105663715 = 19267
1405105663716 = 20521
1405105663717 = 21728

OK, first thing that jumps out is that Linux provides 1ms resolution (Windows is 10). But far more interesting is the jump in the number of counts captured: from approximately 800 in the first dozen measurements, then increasing in a few sharp jumps, ending up in the 20,000 range and staying there for the rest of the samples. This is a dramatic demonstration of the power of Hotspot to optimize code on the fly. If you want to see what it's really doing, here's a version with diagnostics turned on:

> java -cp target/classes/ -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintCompilation experiments.ClockResolution
     45   1       java.lang.Object:: (1 bytes)
     46   2       java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
---   n   java.lang.System::currentTimeMillis (static)
---   n   sun.misc.Unsafe::compareAndSwapLong
     47   3       java.util.concurrent.ConcurrentHashMap::segmentFor (17 bytes)
     47   4       java.lang.Long::hashCode (14 bytes)
     47   5       java.lang.Number:: (5 bytes)
     47   6       java.util.concurrent.ConcurrentHashMap$Segment::get (66 bytes)
     48   7       java.util.concurrent.ConcurrentHashMap::get (19 bytes)
Inlining intrinsic _hashCode (virtual) at bci:1 in java.util.concurrent.ConcurrentHashMap::get (19 bytes)
     49   8       java.util.concurrent.ConcurrentHashMap$Segment::getFirst (14 bytes)
     49   9       experiments.ClockResolution::incrementCounter (54 bytes)
     50  10       java.lang.Long:: (10 bytes)
     50  11       java.lang.Long::equals (30 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     51  12       java.lang.Long::longValue (5 bytes)
     51  13       java.util.concurrent.atomic.AtomicLong::incrementAndGet (23 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     51  14       java.util.concurrent.atomic.AtomicLong::get (5 bytes)
     51  15       java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     52   1%      experiments.ClockResolution::main @ 22 (159 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _currentTimeMillis at bci:28 in experiments.ClockResolution::main (159 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
   1033   1%     made not entrant  experiments.ClockResolution::main @ -2 (159 bytes)
number of counters = 1000
...

Tuesday, June 18, 2013

Perils of Reflection: Public Methods on Private Classes

Here's a puzzle: what does the following code print:

Iterator itx = new ArrayList().iterator();
Method method = itx.getClass().getMethod("hasNext");
System.out.println(method.invoke(itx));

You might expect it to print false: you're invoking the hasNext() method on an iterator over an empty list. Instead, you get a rather confusing exception::

Exception in thread "main" java.lang.IllegalAccessException: Class com.example.ReflectionFailure can not access a member of class java.util.AbstractList$Itr with modifiers "public"
    at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:65)
    at java.lang.reflect.Method.invoke(Method.java:588)
    at com.example.ReflectionFailure.main(ReflectionFailure.java:16)

Huh? IllegalAccessException indicates that the program doesn't have access to the method. But hasNext() is a public method, how can I not have access to it? The following code compiles and runs without a problem, aren't I doing the same thing with reflection?

Iterator itx = new ArrayList().iterator();
System.out.println(itx.hasNext());

Well, not exactly. If you look at the bytecode of the explicit example, you'll see that you're actually invoking Iterator.hasNext(), not the method defined by the implementation class:

15:  invokeinterface #32,  1; //InterfaceMethod java/util/Iterator.hasNext:()Z

And if you change the reflective code so that you get hasNext() from the Iterator class, it runs without a problem:

Iterator itx = new ArrayList().iterator();
Method method = Iterator.class.getMethod("hasNext");
System.out.println(method.invoke(itx));

But the first piece of code isn't doing that. Instead, it's trying to do something like the following:

AbstractList.Itr itx = (AbstractList.Itr)new ArrayList().iterator();

That code won't compile, because AbstractList.Itr is a private class; the compiler rightly complains that it isn't visible.

I'm not the first person to stumble over this. There are a number of bugs entered about this problem, although they seem to focus on the use of an inner class. The problem still manifests even if you have a top-level protected class in another package. There's also a posting on the Java Specialists blog that covers the problem and its workaround:

Iterator itx = new ArrayList().iterator();
Method method = Iterator.class.getMethod("hasNext");
method.setAccessible(true);
System.out.println(method.invoke(itx));

Method.setAccessible() tells the JVM to allow the calling program to bypass some of the normal security measures; its brother, Field.setAccessible() is a mainstay of injection frameworks and bean introspecters. There's only one problem: if you're running in a sandbox you aren't allowed to tell the JVM to ignore its safety features, you'll get a SecurityException instead.

The only always-usable solution is to figure out where the method is actually defined, and use the Method corresponding to that definition. This is not a trivial task: you have to walk the class hierarchy, looking at all of the concrete or abstract (but still accessible) classes in the hierarchy, as well as all of the interfaces that they implement.

So is this a bug in the JVM? If you couldn't use reflection to accomplish something that you could in explicit code, I'd say yes. In this case, however, you aren't mimicking actual code, you just think you are. So, while inconvenient, in my opinion not a bug.

Update: I've been researching how to determine whether or not a Method can be invoked without an access error. I assumed that the isAccessible() would work: if it returned true, I was good to go. I quickly learned that it always returns false, unless you've specifically called setAccessible(true). Not, in my opinion, terribly useful.

I've combed through the documentation several times, but can't find any method that will tell me if I can invoke a Method. I suppose this is reasonable: accessibility depends on the object doing the invoking. This means that any “find the accessible method” utility would have to take the invoking class as an argument, and there would be no guarantee that the returned method could be invoked by an instance of another class. Not worth pursuing.

Friday, August 31, 2012

If A Field Isn't Referenced, Does It Exist?

In my last post I said that, pre-annotations, finding all the classes referenced by a given class was a simple matter of scanning the constant pool for CONSTANT_Class entries. Turns out that it isn't quite that simple (which makes the last dependency analyzer I wrote not quite correct). Consider the following class:

public class ClassLoadExample
{
    private BigDecimal imNotReferenced;
    
    public void foo()
    {
//        imNotReferenced = new BigDecimal("123.45");
    }
    
    public static void main(String[] argv)
    throws Exception
    {
        System.err.println("main started");
        new ClassLoadExample().foo();
        System.err.println("foo called, main done");
    }
}

If you compile this and walk its constant pool, you won't find a CONSTANT_Class_info entry for BigDecimal. What you will find are two CONSTANT_Utf8_info entries, containing the name of the variable and its type. And you'll find an entry in the field list that references these constants.

Explicitly set the variable to null, and the constant pool gets two additional entries: a CONSTANT_Fieldref_info, and an associated CONSTANT_NameAndType_info entry that links the existing entries for the field's name and type. Initialize the field via the BigDecimal constructor, or invoke a method on the instance, and the expected CONSTANT_Class_info appears.

At first glance, this behavior seems like a WTF: you're clearly referencing BigDecimal, so why doesn't the constant pool reflect that fact? But if you think about how the JVM loads classes, it seems less a WTF and more a premature optimization … and also a reminder that every computer system carries with it the constraints present at its birth. Java was born in 1996, in a world where 256 Mb of RAM was a “workstation-class” machine, and CPU cycles were precious. Today, of course, there's more CPU/RAM in an obsolete smartphone.

To preserve memory and cycles, the JVM loads classes on an as-needed basis, nominally starting with the class holding your main() method (but see below). In order to initialize your class, the JVM will have to load its superclass and any interfaces, as well as any classes referenced in static initializers. But it doesn't have to load classes that are only referenced by member variables, because those classes won't get used until the member variable is first accessed — which might not ever happen. Even if you construct an instance of the class, there's no reason to load the class: the member variable is simply a few bytes in the instance that has been initialized to null. You don't need to load the class until you actually invoke a method on it.*

I said a premature optimization, but that's not right. The JDK has a lot of built-in classes: over 17,000 in rt.jar for JDK 1.6. You don't want to load all of them for every program, because only a relative few of them will ever be used. But one would think that, as machines became more capable, the Java compiler might add CONSTANT_Class_info entries for every referenced class, and the JVM might choose to preload those classes; the JVM spec doesn't say they couldn't. The JVM development team took a somewhat different approach, however, and decided to preload a bunch of “commonly used” classes. From a performance perspective, that no doubt makes more sense.

But for someone writing a dependency analyzer, it's a royal pain, unless you confine yourself to dependencies that are actually used.

* If you want to see classloading in action, start the JVM with the -XX:+TraceClassLoading flag. If you do this with the example program, you'll see the pre-loads, with the program class near the end. If you uncomment the assignment statement, you'll see that BigDecimal is loaded during the call to method foo().

Monday, August 13, 2012

How Annotations are Stored in the Classfile ... WTF?!?

This weekend I made some changes to BCELX, my library of enhancements for Apache BCEL. These changes were prompted not by a need to access different types of annotations, but because I'm currently working on a tool to find hidden and unnecessary dependency references in Maven projects. Why does this have anything to do with annotation processing? Read on.

Prior to JDK 1.5, the Java class file was a rather simple beast. Every referenced class had a CONSTANT_Class_info entry in the constant pool. This structure actually references another entry in the constant pool, which holds the actual class name, but BCEL provides the ConstantClass object so you don't have to chase this reference. It's very easy to find all the external classes that your program references: walk the constant pool and pull out the ConstantClass values.

That functionality is exactly what I needed to cross-check project dependencies. But when I wrote a testcase to check my dependency-extraction method, it failed. I had used the test class itself as my target, and just by chance I picked the @Test annotation as one of my assertions. As far as my dependency-extraction code was concerned, I didn't have a reference to the annotation class.

I figured that there must be some flag in the CONSTANT_Class_info structure that was confusing BCEL — its released version hasn't been updated for JDK 1.5. So I turned to the JDK 1.5 classfile doc, and it slowly dawned on me: annotation classes aren't referenced in the constant pool. Instead, you have to walk through all of the annotation attributes, and get their names out of the constant pool. OK, I should have realized this sooner; after all, it wasn't so long ago that I'd written the annotation parsing code in BCELX.

Of course, this meant that I now had to add support for parameter and field-level annotations to BCELX (I was going to have to parameters anyway, to support another project). While doing this, I discovered something else interesting: the API docs say that you can apply annotations to packages and local variables, but the classfile docs give no indication that this is actually supported.

There are a couple of things that I take from this experience. The first is that it's another piece of evidence that JDK 1.5 represented a changing of the guard at Sun. Annotations have often had a “tacked on” feel to me — right down to the @interface keyword (they broke backwards compatibility for enum, would it have been so bad to add annotation?). I'm sure there was a reason for not treating an annotation reference as just another class, but quite frankly I can't see it.

The other thing I learned is to beware testcases built around spot checks. If I had written my testcase to look for org.junit.Assert rather than org.junit.Test, I never would have found the issue — until it turned up when using the utility. But there are lots of cases where exhaustive checks aren't cost-effective. Including this one: should I write a test that verifies every possible annotation reference? I'll need to, if I want 100% coverage, but really: it's a tiny part of the overall project.

One that could have been far easier if the JVM team had cleanly integrated their changes, and followed the existing model. I suppose that's the real take-away: if you're evolving a design, don't simply tack on the changes.

blog.kdgregory.com