blog.kdgregory.com: July 2013

Monday, July 29, 2013

Notes from OSCon

I spent last week at OSCon, the O'Reilly Open Source conference. It was my first time in Portland, and I prepared by watching segments from Portlandia on YouTube. I fell in love with the city, but had mixed feelings about the conference.

OSCon is big. Four and a half days of talks, with up to 18 different choices for each time slot. Topics range from Perl (it started as the Perl Conference) to “Geek Lifestyle.” My current focus on alternatives to Java drove many of my choices, but I tried to sample a few topics that were new to me. Here, then, are my top take-aways from the conference:

Invest in disk drive manufacturers

This point was driven home by Jay Parikh's keynote speech. He is the VP of infrastructure engineering at Facebook, which receives 7 petabytes of new user photos each month. Users have the expectation that those photos will always be available and never go away. In response, Facebook designed a dedicated storage rack, and a new building to house them: it will have three server bays, each holding 1 exabyte of storage.

One exabyte is 1,000,000 terabytes. Which, given 4TB drives, doesn't translate to that many physical units. But Facbook is only one part of the story. Worldwide data storage is counted in zettabytes (1,000 exabytes), and is growing exponentially. At least for the foreseeable future, traditional “spinning rust” drives are the most cost-efficient way to store that data. While there may not be a lot of profit in hard drives, volume is important. And the win will go to manufacturers that reduce electrical consumption: to support 1 exabyte of storage, Facebook is using 1.5 megawatts of electricity; a 20% improvement would be huge.

As an aside, this also tells me that Facebook is a screaming short: you can't sustain exponentially-growing costs indefinitely.

Java-the-Language really is fading

The Java and JVM track had exactly one presentation that was primary about Java-the-language, versus three that were primarily about Scala. More telling is to compare the Java track with JavaScript and Python: the latter were filled with talks about using the languages to solve problems, while the former featured frameworks.

OK, part of this may be selection bias specific to the conference. But it's another data point for my career planning.

“Clean Code” still isn't mainstream

In one talk that I attended, the presenter demonstrated how a relatively short Erlang function could be improved by breaking it into smaller pieces. While I enjoyed the presentation, I was compelled to point out that it was simply a restating of modular decomposition, a practice that has been part of software engineering since the 1970s. Or, applied after the fact, refactoring, which has been popular since the late 1990s.

I wonder if clean code is a habit that only comes from experience?

I prefer smaller conferences

While OSCon was big, with lots of sessions on many different tracks, I found it lacking: few of the talks provided insight, and many were downright boring (to the point that I walked out of a couple). This may have been due to my selection of talks, but I got a similar impression from other people that I talked to.

I think that part of the reason for this is that a conference the size of OSCon tries to cater to many needs, from the Perl programmers that were its core constituency, to people writing mobile apps. By comparison, a smaller conference (for example, ETE, which is produced by my employer) has the ability to focus: a few topics, but deep presentations on them.

Friday, July 19, 2013

What is Compelling?

For all that Java got right, I'm looking for something new. I can't claim that my motivation is a good one: I'm starting to feel that Java is déclassé, occupying a “mainstream niche” in which it provides database access to web applications. Yes, there are some interesting things happening with Big Data and Java, but most of the interesting projects that I've seen look to alternative languages even if they keep the JVM.

However, that presents the question “What is compelling about other languages?” Is there anything in the non-Java space that has a story similar the one I saw from Java in 1999? There has been a lot of language development in the last decade or so, with new languages appearing and older languages gaining in popularity. Most of these languages have features that Java doesn't. Do any of those features present a compelling case to leave Java behind?

Before continuing, I want to state that I do not consider one language to be more “powerful” than another. That term gets a lot of use and abuse, but it's meaningless — at least if both languages are Turing-complete. Instead, I look at a language based on how expressive it is for a particular purpose. For example, APL is an extremely expressive language for an extremely narrow purpose.

My purpose is best described as “applications that transform quantum data, possibly in a distributed fashion.” This is almost a simile for “general purpose” computing, but highlights a few things that I find important:

Applications have multiple parts, and will run for an extended period of time. This implies that I want easy modularization, and that performance is important.
Quantum data is a fancy way of saying that I want to work with individual pieces of information, on both the input and output side. So statistical analysis features are not terribly important to me, but data manipulation features are. As I side note, I don't accept the LISPer's view that data-is-code-is-data.
Transformation has similar implications.
My view of “distributed computing” has more to do with how work is partitioned, rather than where those pieces run. Features that support partitioning are important, particularly partitioning of independent concurrent tasks. Once you partition your processing, you can move those pieces anywhere.

So, given this mindset, what are some features that Java doesn't have and that I find interesting?

Interpreted versus Compiled

One of the big things that drew me to Java was its elimination of the traditional build cycle: I didn't have to take a nap while building the entire project. This point was driven home to me last year, talking with a neighbor who is doing iOS development. He was touting how fast the latest high-end Macs could build his projects; I pointed out how the IDE on my nine-year-old desktop PC made that irrelevant.

Build cycles haven't completely gone away with Java: deploying a large Spring web-app, for example, takes a noticeable amount of time. But deployment takes time, even with an “interpreted” framework like Rails. For that matter, client-side JavaScript requires a browser refresh. But saving a few seconds isn't really important.

The REPL (read-eval-print loop), on the other hand, is a big win for interpreted languages. I came of age as a programmer when interactive debugging was ascendant, and I like having my entire environment available to examine and modify.

Duck Typing

Another of the benefits of so-called “scripting” languages is that objects are malleable; they are not rigidly defined by class. If you invoke a method on an object, and that object supports the method, great. If not, what happens depends on the language; in many, an “unimplemented method handler” is called, allowing functionality to be added on the fly. I like that last feature; it was one of the things that drew me to Rails (although I didn't stay there). And I think that function dispatch in a duck-typed language is closer to the “objects respond to messages” ideal of OOP purists.

On a practical basis, I find myself wishing for duck-typing every time I have to repeat a long parameterized type specification in Java. Especially when I have to repeat that specification both to declare and assign a variable. I've often wished for something like typedef, to reduce typing and provide names for generic structures.

But … there's a certain level of comfort in knowing that my compiler will catch my misspellings; I make a lot of them. And at the extreme, the flexibility of this model is no different than the idea of a single method named doSomething that takes a map and returns a map.

Lambdas, Closures, Higher-order Functions

At some point in the last dozen years, function pointers became mainstream. I use that term intentionally: the idea of a function as a something that can be passed around by your program is fundamentally based on the ability to hold a pointer to that function. It's a technique that every C programmer knows, but few use (and if you've ever seen the syntax of a non-trivial function pointer, you know why). It's a feature that is present in Java, albeit with the boilerplate of a class definition and the need to rigidly (and repeatedly) define your function's signature. I think that, as a practical thing, duck typing makes lambdas easier: a function is just another malleable object.

Which is a shame, because the idea of a lambda is extremely useful. For example, reading rows from a JDBC query requires about a dozen lines of boilerplate code, which hides the code that actually processes the data. Ditto for file processing, or list processing in general.

While I like lambdas, closures quite frankly scare me. Especially if used with threads. But that's a topic for another post.

Lazy Evaluation

I don't believe that “functional” languages are solely about writing your code as a series of higher-order functions. To me, a critical part of the functional approach is the idea that functions will lazily evaluate their arguments. The example-du-jour calculates the squares of a limited set of integers. It appears to take the entire set of integers as an argument, but lazy evaluation means that those values are only created when needed.

As people have pointed out, lazy evaluation can be implemented using an iterator in Java. Something to think about is that it also resembles a process feeding a message queue.

Structured Data Abstraction

One of the things I like about Groovy is that everything looks like an object, even XML. Combined with the null-safe navigation operator, you can replace a dozen lines of code (or an XPath) with a very natural foo?.bar?.baz. If your primary goal is data manipulation, features like this are extremely valuable. Unfortunately, the list of languages that support them is very short.

A different concurrency paradigm

Concurrent programming isn't easy. Programming itself isn't easy: for any non-trivial application you have to maintain a mental model of how that application's internal state changes over time, and which parts get to change what state. A concurrent application raises the bar by introducing non-determinism: two (or more) free-running threads of execution that update the same state at what seem to be arbitrary times. Java provided a model for synchronizing these threads of execution, at least within a single process, but Java synchronization brings with it the specter of contention: threads sitting idle waiting for a mutex.

Concurrency has been a hidden issue for years: even a simple web-app may have concurrent requests for the same user, accessing the same session data. With the rise of multi-core processors, the issue will become more visible, especially as programmers try to exploit performance gains via parallel operations. While Java's synchronization tools work well for “accidentally concurrent” programs, I don't believe they're sufficient for full-on concurrent applications.

New languages have the opportunity to create new approaches to managing concurrency, and two of the leading approaches are transactional memory and actors. Of the two, I far prefer the latter.

Those are the features that interest me; other people will have different ones. But given those desired features, along with my overall goals for what I'm doing with computers, I've surveyed the landscape of languages and thought about which could replace Java for me. That's the topic of my next post.

Wednesday, July 17, 2013

Things that Java Got Right

Java turns 18 this year. I've been using it for 14 of those years, and as I've said before, I like it, both as a language and as a platform. Recently, I've been thinking more deeply about why I made the switch from C++ and haven't looked back.

One well-known if biased commentator said that Java's popularity was due to “the most intense marketing campaign ever mounted for a programming language,” but I don't think that's the whole story. I think there were some compelling reasons to use Java in 1995 (or, for me, 1999). And I'm wondering whether these reasons remain compelling in 2013.

Before going further, I realize that this post mixes Java the language with Java the platform. For the purposes of this post, I don't think they're different: Java-the-platform was created to support Java-the-language, and Java programs reflect this fact. Superficially, Java-the-language is quite similar to C or C++. Idiomatically, it is very different, and that is due to the platform.

Dynamic Dispatch

Both Java and C++ support polymorphism via dynamic dispatch: your code is written in terms of a base class, while the actual object instances are of different descendant classes. The language provides the mechanism by which the correct class‘ method is invoked, depending on the actual instance.

However, this is not the default mechanism for C++, static dispatch is. The compiler creates a mangled name for each method, combining class name, method name, and parameters, and writes a hard reference to that name into the generated code. If you want dynamic dispatch, you must explicitly mark a method as virtual. For those methods alone, the compiler invokes the method via a reference stored in the object's “V-table.”

This made sense given the performance goals of C++: a V-table reference adds (a tiny amount of) memory to each object instance, and the indirect method lookup adds (a tiny amount of) time to each invocation. As a result, while C++ supported polymorphic objects, it was a rare C++ application that actually relied on polymorphism (at least in my experience in the 1990s). It wasn't until I saw dozens of Java classes created by a parser generator that I truly understood the benefits of class-based polymorphism.

Today, the prevalence of “interpreted” languages means that dynamic dispatch is the norm — and in a far more flexible form than Java provides.

Late Binding

At the time Java appeared, most applications were statically linked: the compiler produced an object file filled with symbolic references, then the linker replaced these symbolic references with physical addresses when it produced the executable. Shared libraries existed, and were used heavily in the Windows world (qv “DLL hell”), but their primary purpose was to conserve precious RAM (because they could be shared between processes).

In addition to saving RAM, shared libraries had another benefit: since you linked them into the program at runtime, you could change your program's behavior simply by changing the libraries that you used. For example, if you had to support multiple databases you could write a configuration file that selected a specific data access library. At the time, I was working on an application that had to do just that, but we found it far easier to just build separate executables and statically link the libraries. Talking with other people, this was a common opinion. The Apache web server was an exception, although in its case the goal was again to save RAM by not linking modules that you didn't want; you also had an option to rebuild with statically-linked libraries.

Java, by comparison, always loads classes on an as-needed basis when the program runs. Changing the libraries that you use is simply a matter of changing your classpath. If you want, you can change a single classfile.

This has two effects: first, your build times are dramatically reduced. If you change one file, you can recompile just that file. When working on large C and C++ codebases, I've spent a lot of time optimizing builds, trying to minimize time spent staring at a scrolling build output.

The second effect is that the entire meaning of a “release” changes. With my last C++ product, bug fixes got rolled into the six-month release cycle; unless you were a large customer who paid a lot in support, you had to wait. With my first Java project, we would ship bugfixes to the affected customers as soon as the bug was fixed. There was no effort expended to “cut a release,” it was simply a matter of emailing classfiles.

Classloaders

Late binding was present in multiple other languages at the time Java appeared, but to the best of my knowledge, the concept of a classloader — a mechanism for isolating different applications within a single process — was unique to Java. It represented late binding on steroids: with a little classloading magic, and some discipline in how you wrote your applications, you could update your applications while they were running. Or, as in the case of applets, applications could be loaded, run, and then be discarded and their resources reclaimed.

Classloaders do more than simply isolate an application: the classloader hierarchy controls the interaction of separately-loaded applications (or, at least, groups of classes). This is a hard problem, and Java classloaders are an imperfect solution. OSGi tries to do a better job, but adds complexity to an application's deployment.

Threads

Threads are another thing that predated Java but didn't see a lot of use. For one thing, “threads” gives the impression of a uniform interface, which wasn't the case in the mid-1990s. I've worked with Posix threads, Solaris threads, and one other whose name I can't remember. They all had the same basic idea, but subtly different APIs and implementation. And even if you knew the APIs, you'd find that vendor-supplied code wasn't threadsafe. Faced with these obstacles, most Unix-centric programmers turned to the tried-and-true multi-process model.

But that decision led to design compromises. I think the InConcert server was typical in that it relied on the database to manage consistency between multiple processes. We could have gotten a performance boost by creating a cache in shared memory — but we'd have to implement our own allocator and coherence mechanism to make that work. One of the things that drove me to prototype a replacement in Java was its ability to use threads and a simple front-end cache.

I could contemplate this implementation because Java provided threads as a core part of the language, along with synchronization primitives for coordinating them. My cache was a simple synchronized Map: retrieval operations could probe the map without worrying that another thread was of updating it. Updates would clear the cache on their way out, meaning that the next read would reload it.

That said, today I've come to the belief that most code should be thread-agnostic, written as if it were the only thing running, and not sharing resources with other threads. Concurrent programming is hard, even when the language provides primitives for coordination, and in a massively-parallel world contention is the enemy. A shared-nothing mentality — at least for mutable state — and a queue-based communication model makes for far simpler programs and (I believe) higher overall performance.

Library Support

All of the above were neat, but I think the truly compelling reason to use Java in the 1990s was that it came with a huge standard library. You wanted basic data structures? They were in there. Database access? In there. A way to make HTTP requests? In there. Object-oriented client-server communication? Yep. A GUI? Ugly, but there.

This was a time when the STL wasn't yet ubiquitous; indeed, some C++ compiler vendors were just figuring out how to implement templates (causing no end of problems for cross-platform applications that tried to use them). The RogueWave library was popular, but you had to work for a company willing to buy it. I think every C++ programmer of the 1990s had his or her own string class, with varying levels of functionality and correctness. It was a rite of passage, the first thing you did once you figured out how classes worked.

Java's large — and ever-growing — library has been both a blessing and a curse. It's nice to have one standard way to do most tasks, from parsing XML to computing a cryptographic hash. On the other hand, in JDK 1.6 there are 17,484 classes in 754 packages. Many imported whole from third-party libraries. This is bloat for those who don't need the features. Worse, it creates friction and delay for updates: if you find a bug in the XML processor, should you file it with Oracle or Apache? And will it ever get fixed?

Those were the things that I found compelling about Java. Other people had different lists, but I think that everyone who adopted Java in the late 1990s and early 2000s did so for practical reasons. It wasn't simply Sun's marketing efforts, we actually believed that Java offered benefits that weren't available elsewhere.

The 2000s have been a time of change on the language scene: new languages have appeared, and some older languages have become more popular. What would make a compelling case for leaving Java behind?

Monday, July 15, 2013

Signs that Java is Fading

My website is losing traffic. It never had terribly high traffic, other than the days it was mentioned on Reddit or Hacker News, but in the last year that traffic has been cut in half. There are no doubt many reasons for this, but I'm going to take a dramatic interpretation: it's another indication that the time of Java is passing.

Bold words for someone who only gets a few hundred hits a day, but the character of those hits are changing. The two most-hit pages on my site have consistently been those on bytebuffers and reference objects: topics that are of interest to people doing “interesting” things, and not something that you'd use in a typical corporate web-app. Recently, these two have been eclipsed by articles on parsing XML and debugging out-of-memory errors.

This sense that Java isn't being used for “interesting” projects comes from other sources as well. Colleagues and friends are looking into other languages, some on the JVM and some not. One of the best recruiters I know, a long-time specialist in Java (and founder of the Philadelphia Java Users Group) is now emphasizing other languages on his home page.

Of course, Java isn't going to disappear any time soon. There are billions of dollars of sunk costs in Java software, and it needs to be maintained. And with hundreds of thousands of Java programmers in the job market, employers know that they can always staff their projects, old and new. Corporate inertia is all but impossible to overcome, as evidenced by the fact that you still see plenty of COBOL positions open.

Java is occasionally called “the COBOL of the 21st century,” and I've never found the comparison apt. But the position of Java today seems to be similar to that of COBOL in the 1980s: a huge installed base, huge demand for developers, but slotted into a specific application area. If you wanted to venture out of the accounting realm, you looked for something else.

Yesterday was my birthday. This upcoming January will mark 30 years that I've received a regular paycheck for making computers do what other people want. In that time, I've switched areas of expertise several times. I figure that I'm going to be in this business for another 15 to 20 years, and I want to be doing something interesting for those years. So it's time to re-evaluate Java, and think about what my next area of expertise should be.

Thursday, July 11, 2013

Is anyone else waiting for this?

Google Search will be retiring on October 23

Today we're announcing a new Connected Search feature for Google+. We believe that this will be an easier way to find the information you need. Once you add Search to your circles, you'll see every request you ever made, right there on your Google+ home page. As the search results change, you'll see those changes in real-time.

We realize that some of you have grown to like the existing Google Search feature, with its single text field and “I feel lucky” button. While we regret the need to discontinue this service, we think you'll find that Search on Google+ will serve all your needs.

blog.kdgregory.com