Monday, October 18, 2010

Living Among Primitives, part 1

There are a lot of people who say that Java shouldn't have primitive data types, that everything should be an object. And there is some truth to their arguments. Most business applications don't do intensive numeric operations, and for those that do, double is a poor substitute for a fully-thought-out Money. You could even claim that the presence of primitives leads to a procedural style of code: things like using an index variable rather than an iterator to access a list. And languages such as Ruby and Python follow the “everything's an object” path without apparent ill effect.

But I feel that primitives are in fact one of the strong points of the Java language, and languages that don't have native primitives often sprout hacks to replace them. Because there are several reasons for primitives to exist.

One is performance. It's true that, for simple expressions, evaluated once, you can't tell the difference. But there's a reason that libraries such as NumPy, which performs matrix math (among other things), are written in C or C++ rather than Python.

There's a primitive at the heart of any numeric object — after all, that's what the CPU works with. So when you invoke a method on a numeric object, not only do you have the overhead of a function call, but you're calling into native code. True, Hotspot could recognize mathematical operations and translate them into inline native code. But that still leaves the cost of allocating and collecting objects that are used only during the evaluation of a function. Unless Hotspot recognized those and transformed them into, well, primitives.

Another, often more important reason to use primitives is memory consumption: an int takes 4 bytes, but on a 32-bit Sun JVM, an Integer adds 8 bytes of overhead, there's also overhead to reference that Integer: 4 bytes on a 32-bit JVM, 8 bytes on a 64-bit JVM. Again, that doesn't seem like much, unless your application is dealing with hundreds of millions of numeric values.

And hundreds of millions of numeric values is exactly what I'm dealing with on my latest project … to the point where I recently told a colleague that “this application should be written in C.” The next few posts will dive into some of the issues that you'll face when living among primitives.

No comments: