Monday, September 20, 2010

BigMemory

Last week Terracotta released BigMemory, which stores objects outside of the Java heap and “defuses the GC time bomb.” Since I've written the top-ranked relevant article when you Google for “non-heap memory” (it's on page 2), my website has seen a spike in volume. Unlike previous spikes, people have been poking around, which is nice to see.

I did some poking around myself, both to Terracotta's blogs, and to some of the Twitter posts and articles that linked to my site. And I saw a lot of comments that disturbed me. They ranged from (paraphrased) “this is just hype, it isn't anything new,” to “I wonder if Oracle will file a patent lawsuit on this,” to “the JVM should be doing this itself,” to the ever-popular “people should just learn to write efficient code.” I'll take on that last one first.

But first some caveats: I've never used EHCache outside of Hibernate, and even then I simply followed a cookbook. My closest contact with EHCache development was attending a Java Users Group meeting with a badly jetlagged Greg Luck, followed by beer with a him and a bunch of other people. I can't remember what was discussed around the table, but I don't think it was cache internals. But I do know something about caches, and about the way that the JVM manages memory, so I feel at least qualified to comment on the comments.

And my first comment is: a cache is efficient code … when appropriately used. You can spend days micro-optimizing your code, only to find it all wasted with a single trip to the database.* You use a cache when it's expensive to retrieve or create often-used data; the L1 and L2 caches in your processor exist because memory is relatively slow, so it's expensive to retrieve. Similarly, it's expensive to create a web page (and load its data from the database), so you would use EHCache (or Akamai) to cache the pre-built page (EHCache is cheaper).

The interface to a cache is pretty straightforward: you give it a key, it gives you a value. In the case of the processor's L1 cache, the key is an address, the value is the bytes around that address. In the case of memcached, the key is a string and the value is a blob. In Java, this corresponds to a Map, and you can create a simple cache using a LinkedHashMap.

The reason to use a LinkedHashMap, versus a regular HashMap, is eviction: removing old entries from the map. Although memory is cheap, it's not infinite, and there's no good reason to keep every piece of data ever accessed. For example, consider Stack Overflow: when a question is on the home page, it might get hundreds of views; clearly you don't want to keep reading it from the database. But after it drops off the home page, it might get viewed once a week, so you don't want to keep it in memory.

With LinkedHashMap, you can implement a simple eviction strategy: once the map reaches a certain size, you remove the head entry on each new put(). More sophisticated caches use more sophisticated strategies: for example, weighting the life of a cached object by the amount of work that it takes to load from source. In my opinion, this is why you go with a pre-built solution like EHCache: they've thought through the eviction strategies and neatly packaged them.

So that's how a cache works. What's interesting is how it interacts with the garbage collector. The point of a cache is to hold content in memory for as long as it's valuable; for a generational garbage collector like Sun's, this means it will end up in the tenured generation … but eventually get removed. And this causes two issues.

First is the cost to find the garbage. The Hotspot GC is a “mark-sweep” collector: it starts at “root” references and traces through the entire object graph to find live objects; everything else is garbage (this is all covered in my article on reference objects; go back and click the link). If you have tens or hundreds of millions of objects in your heap (and for an eCommerce site, that's not an unreasonable number), it's going to take some time to find them all: a few milliseconds. This is, as I understand it, the price that you pay for any collection, major or minor.

A major collection, however, performs one additional step: it compacts the heap after collecting the garbage. This is a big deal. No matter how fast your CPU and memory, it takes time to move gigabytes of memory around. Particularly since the JVM is updating all the references to the moved objects. But if you've ever written a C application that fragmented its heap, you'll thank the Hotspot team every time you see a major collection take place.

Or maybe you won't. Heap compaction is an artifact of tuning the JVM for turn-of-the-century computers, where 1Gb of main memory was “server class,” and swap (the pagefile) was an important part of making your computer usable for multiple applications. Compacting the heap was a way to reduce the resident set size, and improve the overall performance of an application.**

Today, of course, we have desktop-class machines with multiple gigabytes, running an OS and JVM that allow direct access to all of it (and more). And there's been a lot of work in the past few decades to reduce the risk of fragmentation with C-style memory allocation. So maybe it is time to rethink garbage collection in the JVM, at least for the tenured generation, and introduce freelists (of course, that will inflame the “Java is a memory hog” crowd). Or maybe it's time to introduce a new generation, for large tenured objects. Maybe in JDK 1.8. I'm not holding my breath, and neither, it seems, were the EHCache developers.

So to summarize: EHCache has taken techniques that are well-known, and applied them to the Java world. If I were still working in the eCommerce field, I would consider this a Good Thing, and be glad that they invested the time and effort to make it work. Oh, and as far as patents go, I suspect that IBM invented memory-mapped files sometime in the 1960s, so no need to worry.


* Or, in the words of one of my favorite authors: “No matter how subtle the wizard, a knife between the shoulder blades will seriously cramp his style.”

** If you work with Swing apps, you'll often find a minimize handler that explicitly calls System.gc(); this assumes that the app's pages will be swapped out while minimized, and is an attempt to reduce restore time.

Friday, September 17, 2010

Project Management

I recently had a revelation: the first time that I worked with a Project Manager — a person whose sole role is maintaining a schedule and coordinating the tasks on that schedule — was 2002. For nearly 20 years of my career, I worked on teams where project management was a subsidiary role of the team lead or development manager. True, my career has mostly been spent at small companies, some that couldn't afford a dedicated project manager. But there were also a few larger ones — including GE, which you'd expect to be a bastion of project management and rigorous checklist checkers.

Before continuing, I want to say that, unlike many developers, I don't disdain project management per se. I've worked on projects that have succeeded (or at least failed less badly) because a talented project manager pulled together people with diverging goals, people who might have otherwise ignored or actively undercut one-another. I've also worked on projects where the project manager seemed to be actively inflaming the participants. Either way, it's a role with impact, one that cannot be ignored.

So why did I spend two thirds of my career without every seeing a project manager? I think the answer is that the structure of software development organizations changed over that time, along with the companies where they reside. And that's not necessarily a Good Thing.

But first, a little history. Corporate management, as we know it today, didn't exist before the mid-1800s. Prior to that time, business were small and generally confined to a single location; a few hundred employees was an industrial giant. The railroads changed all that: they hired thousands of employees, for a myriad of functions, and those employees were dispersed across the thousands of miles of terrain served by the railroad.

Up to that point, management relied on instant, face-to-face communication between front office and factory floor. This simply was not going to work for the railroads. In response, they adopted and adapted the hierarchical structure of the military, and even some of its terminology. The corporation was now composed of semi-autonomous divisions, which took strategic direction from the home office, but had freedom in tactical operations. Each division had its own complement of functional organizations such as maintenance shops, and those functional organizations kept largely to themselves.

This model worked well for the railroads, and for the giant industrial corporations that followed. You can even see the functional structure embodied in the layout of a manufacturing plant. And it permeated the thinking of the people working for those corporations: at GE in the late 1980s I received a five minute dressing-down from a mid-level manager, for daring to use a photocopy machine that belonged to his group. Even in the software industry, the hierarchical mindset prevailed: as you read The Mythical Man-Month, you won't find “project” managers, just managers.

So where do these project managers come from? I think the answer is construction.

Whether you hire a general contractor for your home remodel, or Bechtel for a billion dollar highway project, you get a project manager. And they're necessary: the construction industry is fragmented into dozens different trades and specialties within trades, even at the level of home repair. Carpenters, electricians, plumbers, masons, sheetrock installers, painters, tilers, landscape designers, and so on … you need all of them, and none of them do the other's job. And more important, each works for only a small part of the project schedule, and then they're gone. And if they don't start at exactly the right time, the whole project gets delayed.

It works for construction, so why not software?

In the 1980s and 1990s, corporations started to adopt “matrix management.” The reason was simple economics: self-contained organizations waste money. Just as you wouldn't want to pay a sheetrock crew to sit idle while the carpenters are building stud walls, most organizations don't want to pay a DBA to sit idle while the developers write front-end code. So the DBA team gets matrixed to the project team: when the project team needs a DBA, one will be assigned.

From the company's perspective, this maximizes employee utilization. And from the DBA's perspective, it's a better career path: rather than being isolated in a product-specific development, she gets to work with her peers and have her work recognized by a manager who doesn't think that mauve databases have more RAM. Everybody wins.

But something I noticed, when working with matrix organizations, is that you could never find a DBA when you needed one — or, as matrix management spread, any other specialist. They always seemed to have other projects demanding their time. Perhaps that was really true: for a corporation wanting to reduce costs, why stop with sharing people, why not understaff as well? But I also noticed that you could always get a DBA to turn up for meetings where there were donuts present.

And what I inferred from this is that matrix management creates a disincentive to project loyalty. After all, the specialist career path depends more on pleasing the specialist manager than the project lead. In the best of cases, specialists can cherry-pick projects that catch their interest, ignoring the rest. In the worst cases, there are lots of places to hide in a matrix organization.

This effect goes deeper than a few DBA's, however. In a fully-matrixed organization, project teams are ad hoc. You no longer have developers who are working on a product, they work on a project. And when it's done — or fails — they move on to another project. Taking with them the in-depth knowledge of what they did and what they should have done. Long term loyalty simply doesn't exist.

And with the creation of ad hoc teams, you need an ad hoc manager: the project manager. So to reiterate, it's not that project managers are bad per se, it's what their presence says about the organization that disturbs me.