Tuesday, September 29, 2009

Intern Isn't Forever ... And Maybe Never Was

Today I managed to hit myself over the head with a cluebat, proof that there's always something to learn and that you should never accept dogma. In this case, the dogma was that interned strings never get garbage-collected. Like all good dogma, it combined a few facts with a leap of faith, and was plausible enough that I never challenged it.

The facts first (actually, only one fact): interned strings are stored in the same pool as literal strings. This is explicitly stated in the JavaDoc for String.intern(), and can be demonstrated with the following code:

String a = "Are we having fun yet?";
String b = new String(a);
System.out.println(a == b);

String c = b.intern();
System.out.println(a == c);

And now the leap of faith: the JVM doesn't clean up the constant pool. Seems plausible: after all, two literal strings are guaranteed to be the same. And intern() is a native method, so it must be doing something tricky behind the scenes. And everybody else says you'll cause bugs if you intern too many strings, so …

A skeptic might ask “how can you tell that two string literals aren't the same if you don't have references to both?” I even said as much when I wrote about canonicalizing maps (an article that got some edits today). Once all references to a string go out of scope (including any references within a class definition), then there's no need to keep that string in the pool. And the JVM doesn't — at least, the Sun JVM doesn't.

This particular cluebat entered the picture because I'm currently writing an article on out-of-memory exceptions, and wanted a program to demonstrate permgen failures. So I wrote a loop that interned big, random-content strings … and nothing happened. I killed the program after I finally realized that it wasn't going to die on its own.

But the dogma must have some basis in fact, right? It happens that I have a machine with Sun JVMs from version 1.2 on up. So I ran my test program on each revision, and while the -verbose:gc output changed, the result did not: all of these versions appear to clean up the string pool. Is it possible that a 1.1 release is the source of this dogma? Perhaps, and if someone still has one installed, here's the program:

public class InternExhaustion
    public static void main(String[] argv)
    throws Exception
        while (true)
            String str = generateRandomString(65536);
    private static String generateRandomString(int length)
        char[] chars = new char[length];
        for (int ii = 0 ; ii < length ; ii++)
            chars[ii] = (char)(96 * Math.random() + ' ');
        return new String(chars);

For myself, I have some edits to make. And a lump on the head to remind me to question dogma.

No comments: