Friday, August 31, 2012

If A Field Isn't Referenced, Does It Exist?

In my last post I said that, pre-annotations, finding all the classes referenced by a given class was a simple matter of scanning the constant pool for CONSTANT_Class entries. Turns out that it isn't quite that simple (which makes the last dependency analyzer I wrote not quite correct). Consider the following class:

public class ClassLoadExample
{
    private BigDecimal imNotReferenced;
    
    public void foo()
    {
//        imNotReferenced = new BigDecimal("123.45");
    }
    
    public static void main(String[] argv)
    throws Exception
    {
        System.err.println("main started");
        new ClassLoadExample().foo();
        System.err.println("foo called, main done");
    }
}

If you compile this and walk its constant pool, you won't find a CONSTANT_Class_info entry for BigDecimal. What you will find are two CONSTANT_Utf8_info entries, containing the name of the variable and its type. And you'll find an entry in the field list that references these constants.

Explicitly set the variable to null, and the constant pool gets two additional entries: a CONSTANT_Fieldref_info, and an associated CONSTANT_NameAndType_info entry that links the existing entries for the field's name and type. Initialize the field via the BigDecimal constructor, or invoke a method on the instance, and the expected CONSTANT_Class_info appears.

At first glance, this behavior seems like a WTF: you're clearly referencing BigDecimal, so why doesn't the constant pool reflect that fact? But if you think about how the JVM loads classes, it seems less a WTF and more a premature optimization … and also a reminder that every computer system carries with it the constraints present at its birth. Java was born in 1996, in a world where 256 Mb of RAM was a “workstation-class” machine, and CPU cycles were precious. Today, of course, there's more CPU/RAM in an obsolete smartphone.

To preserve memory and cycles, the JVM loads classes on an as-needed basis, nominally starting with the class holding your main() method (but see below). In order to initialize your class, the JVM will have to load its superclass and any interfaces, as well as any classes referenced in static initializers. But it doesn't have to load classes that are only referenced by member variables, because those classes won't get used until the member variable is first accessed — which might not ever happen. Even if you construct an instance of the class, there's no reason to load the class: the member variable is simply a few bytes in the instance that has been initialized to null. You don't need to load the class until you actually invoke a method on it.*

I said a premature optimization, but that's not right. The JDK has a lot of built-in classes: over 17,000 in rt.jar for JDK 1.6. You don't want to load all of them for every program, because only a relative few of them will ever be used. But one would think that, as machines became more capable, the Java compiler might add CONSTANT_Class_info entries for every referenced class, and the JVM might choose to preload those classes; the JVM spec doesn't say they couldn't. The JVM development team took a somewhat different approach, however, and decided to preload a bunch of “commonly used” classes. From a performance perspective, that no doubt makes more sense.

But for someone writing a dependency analyzer, it's a royal pain, unless you confine yourself to dependencies that are actually used.


* If you want to see classloading in action, start the JVM with the -XX:+TraceClassLoading flag. If you do this with the example program, you'll see the pre-loads, with the program class near the end. If you uncomment the assignment statement, you'll see that BigDecimal is loaded during the call to method foo().

No comments: