Wednesday, July 17, 2013

Things that Java Got Right

Java turns 18 this year. I've been using it for 14 of those years, and as I've said before, I like it, both as a language and as a platform. Recently, I've been thinking more deeply about why I made the switch from C++ and haven't looked back.

One well-known if biased commentator said that Java's popularity was due to “the most intense marketing campaign ever mounted for a programming language,” but I don't think that's the whole story. I think there were some compelling reasons to use Java in 1995 (or, for me, 1999). And I'm wondering whether these reasons remain compelling in 2013.

Before going further, I realize that this post mixes Java the language with Java the platform. For the purposes of this post, I don't think they're different: Java-the-platform was created to support Java-the-language, and Java programs reflect this fact. Superficially, Java-the-language is quite similar to C or C++. Idiomatically, it is very different, and that is due to the platform.

Dynamic Dispatch
Both Java and C++ support polymorphism via dynamic dispatch: your code is written in terms of a base class, while the actual object instances are of different descendant classes. The language provides the mechanism by which the correct class‘ method is invoked, depending on the actual instance.

However, this is not the default mechanism for C++, static dispatch is. The compiler creates a mangled name for each method, combining class name, method name, and parameters, and writes a hard reference to that name into the generated code. If you want dynamic dispatch, you must explicitly mark a method as virtual. For those methods alone, the compiler invokes the method via a reference stored in the object's “V-table.”

This made sense given the performance goals of C++: a V-table reference adds (a tiny amount of) memory to each object instance, and the indirect method lookup adds (a tiny amount of) time to each invocation. As a result, while C++ supported polymorphic objects, it was a rare C++ application that actually relied on polymorphism (at least in my experience in the 1990s). It wasn't until I saw dozens of Java classes created by a parser generator that I truly understood the benefits of class-based polymorphism.

Today, the prevalence of “interpreted” languages means that dynamic dispatch is the norm — and in a far more flexible form than Java provides.

Late Binding
At the time Java appeared, most applications were statically linked: the compiler produced an object file filled with symbolic references, then the linker replaced these symbolic references with physical addresses when it produced the executable. Shared libraries existed, and were used heavily in the Windows world (qv “DLL hell”), but their primary purpose was to conserve precious RAM (because they could be shared between processes).

In addition to saving RAM, shared libraries had another benefit: since you linked them into the program at runtime, you could change your program's behavior simply by changing the libraries that you used. For example, if you had to support multiple databases you could write a configuration file that selected a specific data access library. At the time, I was working on an application that had to do just that, but we found it far easier to just build separate executables and statically link the libraries. Talking with other people, this was a common opinion. The Apache web server was an exception, although in its case the goal was again to save RAM by not linking modules that you didn't want; you also had an option to rebuild with statically-linked libraries.

Java, by comparison, always loads classes on an as-needed basis when the program runs. Changing the libraries that you use is simply a matter of changing your classpath. If you want, you can change a single classfile.

This has two effects: first, your build times are dramatically reduced. If you change one file, you can recompile just that file. When working on large C and C++ codebases, I've spent a lot of time optimizing builds, trying to minimize time spent staring at a scrolling build output.

The second effect is that the entire meaning of a “release” changes. With my last C++ product, bug fixes got rolled into the six-month release cycle; unless you were a large customer who paid a lot in support, you had to wait. With my first Java project, we would ship bugfixes to the affected customers as soon as the bug was fixed. There was no effort expended to “cut a release,” it was simply a matter of emailing classfiles.

Classloaders
Late binding was present in multiple other languages at the time Java appeared, but to the best of my knowledge, the concept of a classloader — a mechanism for isolating different applications within a single process — was unique to Java. It represented late binding on steroids: with a little classloading magic, and some discipline in how you wrote your applications, you could update your applications while they were running. Or, as in the case of applets, applications could be loaded, run, and then be discarded and their resources reclaimed.

Classloaders do more than simply isolate an application: the classloader hierarchy controls the interaction of separately-loaded applications (or, at least, groups of classes). This is a hard problem, and Java classloaders are an imperfect solution. OSGi tries to do a better job, but adds complexity to an application's deployment.

Threads
Threads are another thing that predated Java but didn't see a lot of use. For one thing, “threads” gives the impression of a uniform interface, which wasn't the case in the mid-1990s. I've worked with Posix threads, Solaris threads, and one other whose name I can't remember. They all had the same basic idea, but subtly different APIs and implementation. And even if you knew the APIs, you'd find that vendor-supplied code wasn't threadsafe. Faced with these obstacles, most Unix-centric programmers turned to the tried-and-true multi-process model.

But that decision led to design compromises. I think the InConcert server was typical in that it relied on the database to manage consistency between multiple processes. We could have gotten a performance boost by creating a cache in shared memory — but we'd have to implement our own allocator and coherence mechanism to make that work. One of the things that drove me to prototype a replacement in Java was its ability to use threads and a simple front-end cache.

I could contemplate this implementation because Java provided threads as a core part of the language, along with synchronization primitives for coordinating them. My cache was a simple synchronized Map: retrieval operations could probe the map without worrying that another thread was of updating it. Updates would clear the cache on their way out, meaning that the next read would reload it.

That said, today I've come to the belief that most code should be thread-agnostic, written as if it were the only thing running, and not sharing resources with other threads. Concurrent programming is hard, even when the language provides primitives for coordination, and in a massively-parallel world contention is the enemy. A shared-nothing mentality — at least for mutable state — and a queue-based communication model makes for far simpler programs and (I believe) higher overall performance.

Library Support
All of the above were neat, but I think the truly compelling reason to use Java in the 1990s was that it came with a huge standard library. You wanted basic data structures? They were in there. Database access? In there. A way to make HTTP requests? In there. Object-oriented client-server communication? Yep. A GUI? Ugly, but there.

This was a time when the STL wasn't yet ubiquitous; indeed, some C++ compiler vendors were just figuring out how to implement templates (causing no end of problems for cross-platform applications that tried to use them). The RogueWave library was popular, but you had to work for a company willing to buy it. I think every C++ programmer of the 1990s had his or her own string class, with varying levels of functionality and correctness. It was a rite of passage, the first thing you did once you figured out how classes worked.

Java's large — and ever-growing — library has been both a blessing and a curse. It's nice to have one standard way to do most tasks, from parsing XML to computing a cryptographic hash. On the other hand, in JDK 1.6 there are 17,484 classes in 754 packages. Many imported whole from third-party libraries. This is bloat for those who don't need the features. Worse, it creates friction and delay for updates: if you find a bug in the XML processor, should you file it with Oracle or Apache? And will it ever get fixed?

Those were the things that I found compelling about Java. Other people had different lists, but I think that everyone who adopted Java in the late 1990s and early 2000s did so for practical reasons. It wasn't simply Sun's marketing efforts, we actually believed that Java offered benefits that weren't available elsewhere.

The 2000s have been a time of change on the language scene: new languages have appeared, and some older languages have become more popular. What would make a compelling case for leaving Java behind?

No comments: