Saturday, July 12, 2014

Hotspot In Action

Yesterday I was writing some code that used the output of System.currentTimeMillis() as a database key. Yeah, yeah, that's a really bad idea, I know. But I figured that it was going to be a low-volume operation, and I added retry logic, and … I started to wonder about the resolution of the system clock, as returned by System.currentTimeMillis().

Returns the current time in milliseconds. Note that while the unit of time of the return value is a millisecond, the granularity of the value depends on the underlying operating system and may be larger. For example, many operating systems measure time in units of tens of milliseconds.

So, having some time to fill during a build, I wrote a program to determine the granularity of my operating system(s).

public static void main(String[] argv)
throws Exception
{
    ConcurrentHashMap<Long,AtomicLong> counters = new ConcurrentHashMap<Long,AtomicLong>();
    long finish = System.currentTimeMillis() + 1000;
    long current = 0;
    while ((current = System.currentTimeMillis()) < finish)
    {
        incrementCounter(counters, current);
    }

    System.out.println("number of counters = " + counters.size());
    TreeMap<Long,AtomicLong> displayMap = new TreeMap<Long,AtomicLong>(counters);
    for (Map.Entry<Long,AtomicLong> entry : displayMap.entrySet())
    {
        System.out.println(entry.getKey() + " = " + entry.getValue().longValue());
    }
}

private static void incrementCounter(ConcurrentHashMap<Long,AtomicLong> counters, long current)
{
    Long key = new Long(current);
    AtomicLong counter = counters.get(key);
    if (counter == null)
    {
        counters.putIfAbsent(key, new AtomicLong());
        counter = counters.get(key);
    }
    counter.incrementAndGet();
}

I'm not sure why I decided to use a Map, rather than simply storing the distinct timestamps in a Set (and yes, rather than roll my own counter map, I should have used an existing implementation). But the map gave me some interesting results:

number of counters = 1000
1405105663691 = 805
1405105663692 = 775
1405105663693 = 808
1405105663694 = 796
1405105663695 = 799
1405105663696 = 789
1405105663697 = 821
1405105663698 = 820
1405105663699 = 818
1405105663700 = 792
1405105663701 = 788
1405105663702 = 821
1405105663703 = 740
1405105663704 = 1103
1405105663705 = 1246
1405105663706 = 1921
1405105663707 = 2959
1405105663708 = 5960
1405105663709 = 6242
1405105663710 = 12867
1405105663711 = 13331
1405105663712 = 12883
1405105663713 = 13436
1405105663714 = 13457
1405105663715 = 19267
1405105663716 = 20521
1405105663717 = 21728

OK, first thing that jumps out is that Linux provides 1ms resolution (Windows is 10). But far more interesting is the jump in the number of counts captured: from approximately 800 in the first dozen measurements, then increasing in a few sharp jumps, ending up in the 20,000 range and staying there for the rest of the samples. This is a dramatic demonstration of the power of Hotspot to optimize code on the fly. If you want to see what it's really doing, here's a version with diagnostics turned on:

> java -cp target/classes/ -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintCompilation experiments.ClockResolution
     45   1       java.lang.Object:: (1 bytes)
     46   2       java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
---   n   java.lang.System::currentTimeMillis (static)
---   n   sun.misc.Unsafe::compareAndSwapLong
     47   3       java.util.concurrent.ConcurrentHashMap::segmentFor (17 bytes)
     47   4       java.lang.Long::hashCode (14 bytes)
     47   5       java.lang.Number:: (5 bytes)
     47   6       java.util.concurrent.ConcurrentHashMap$Segment::get (66 bytes)
     48   7       java.util.concurrent.ConcurrentHashMap::get (19 bytes)
Inlining intrinsic _hashCode (virtual) at bci:1 in java.util.concurrent.ConcurrentHashMap::get (19 bytes)
     49   8       java.util.concurrent.ConcurrentHashMap$Segment::getFirst (14 bytes)
     49   9       experiments.ClockResolution::incrementCounter (54 bytes)
     50  10       java.lang.Long:: (10 bytes)
     50  11       java.lang.Long::equals (30 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     51  12       java.lang.Long::longValue (5 bytes)
     51  13       java.util.concurrent.atomic.AtomicLong::incrementAndGet (23 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     51  14       java.util.concurrent.atomic.AtomicLong::get (5 bytes)
     51  15       java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     52   1%      experiments.ClockResolution::main @ 22 (159 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _currentTimeMillis at bci:28 in experiments.ClockResolution::main (159 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
   1033   1%     made not entrant  experiments.ClockResolution::main @ -2 (159 bytes)
number of counters = 1000
...

No comments: