Monday, March 19, 2012

Synchronizing put() is Not Sufficient

Have you ever seen a class like this, which synchronizes put() but not get()? Have you ever written one?

public class MyConcurrentlyAccessedLookupTable
    private Map<String,MyDataObject> data = new HashMap<String,MyDataObject>();
    // ...
    public MyDataObject get(String key)
        return data.get(key);
    public synchronized void put(String key, MyDataObject value)
        data.put(key, value);

I've been working over a legacy codebase recently, and have seen a lot of them. And I want to track down the original developer, give him a good shake, and say “this only works bacause you never put anything into the map after initializing it!” Actually, this is one of the least egregious ways that he (yes, he, I know his name) managed to fail at concurrent programming. Don't get me started on his abuse of static initializers.

OK, it feels good to let that out. I can at least understand why he thought it would work. A HashMap, by its nature is resilient to incorrectly managed concurrent access. Under normal operation, new items get added to a linked list; a get() should either see or not see the new entry. With put(), because two new entries might get added to the same bucket list and the scheduler might decide to switch threads in the middle, one of the entries might disappear.

And then there's resize: doubling the size of the hash table to reduce the depth of the bucket chains. And, at least in the Sun implementation, there's no attempt to preserve the chains during this operation (I can't see why any implementation would, but there might be one). So put() could get a chain using the old table, but that chain could be assigned to a different bucket by the time the new entry is attached.

OK, so in both cases get() fails to find an entry that is actually in the map, is that really so bad? Well, is it bad if you spend days trying to figure out why a transaction failed when the data was obviously present? Or if that failed transaction caused your company to lose a sale?

The problem with concurrency bugs is that they aren't easily reproducible. Sometimes they appear once and never again. But they're still bugs, and they're not that difficult to avoid: when in doubt, synchronize all access to shared state. If you understand your access patterns (eg, you always initialize and then only read), don't synchronize at all.

But never go halfway.


Corey Gaffney said...

ConcurrentHashMap, wouldn't this work? Please be easy on me, I am very new to Java. :)

kdgregory said...

Yes, ConcurrentHashMap should be your first choice if you need to share a map between threads. However, it's only been available since 2006, and there's a lot of old code that still shares a HashMap. And it has a few quirks of its own (if you're doing a lot of updates, you'll find that it generates a lot of garbage for the GC).

Be aware, though: CHM on its own won't eliminate concurrency bugs. If you store mutable objects in the map, you still need to synchronize all code that reads or writes those objects (but not the map accesses). If you make the objects in the map immutable, then you don't have to explicitly synchronize at all.