Today I managed to hit myself over the head with a cluebat, proof that there's always something to learn and that you should never accept dogma. In this case, the dogma was that interned strings never get garbage-collected. Like all good dogma, it combined a few facts with a leap of faith, and was plausible enough that I never challenged it.
The facts first (actually, only one fact): interned strings are stored in the same pool as literal strings. This is explicitly stated in the JavaDoc for String.intern()
, and can be demonstrated with the following code:
String a = "Are we having fun yet?"; String b = new String(a); System.out.println(a == b); String c = b.intern(); System.out.println(a == c);
And now the leap of faith: the JVM doesn't clean up the constant pool. Seems plausible: after all, two literal strings are guaranteed to be the same. And intern()
is a native method, so it must be doing something tricky behind the scenes. And everybody else says you'll cause bugs if you intern too many strings, so …
A skeptic might ask “how can you tell that two string literals aren't the same if you don't have references to both?” I even said as much when I wrote about canonicalizing maps (an article that got some edits today). Once all references to a string go out of scope (including any references within a class definition), then there's no need to keep that string in the pool. And the JVM doesn't — at least, the Sun JVM doesn't.
This particular cluebat entered the picture because I'm currently writing an article on out-of-memory exceptions, and wanted a program to demonstrate permgen failures. So I wrote a loop that interned big, random-content strings … and nothing happened. I killed the program after I finally realized that it wasn't going to die on its own.
But the dogma must have some basis in fact, right? It happens that I have a machine with Sun JVMs from version 1.2 on up. So I ran my test program on each revision, and while the -verbose:gc
output changed, the result did not: all of these versions appear to clean up the string pool. Is it possible that a 1.1 release is the source of this dogma? Perhaps, and if someone still has one installed, here's the program:
public class InternExhaustion { public static void main(String[] argv) throws Exception { while (true) { String str = generateRandomString(65536); str.intern(); } } private static String generateRandomString(int length) { char[] chars = new char[length]; for (int ii = 0 ; ii < length ; ii++) chars[ii] = (char)(96 * Math.random() + ' '); return new String(chars); } }
For myself, I have some edits to make. And a lump on the head to remind me to question dogma.