Friday, July 18, 2014

XPath String Comparisons

I recently learned something about XPath: relational operations don't compare strings.

When neither object to be compared is a node-set and the operator is <=, <, >= or >, then the objects are compared by converting both objects to numbers and comparing the numbers according to IEEE 754.

Here's an example program that demonstrates the problem: given a set of records, select those where a specific code is in the desired range:*

public class XPathRange1
{
    private final static String xml = "<root>"
                                    +     "<entry id='1' code='A'>foo</entry>"
                                    +     "<entry id='2' code='A'>bar</entry>"
                                    +     "<entry id='3' code='C'>baz</entry>"
                                    +     "<entry id='4' code='E'>argle</entry>"
                                    +     "<entry id='4' code='E'>bargle</entry>" 
                                    + "</root>";
    

    public static void main(String[] argv) throws Exception
    {
        Document dom = ParseUtil.parse(xml);
        
        List<Node> selection1 = new XPathWrapper("//entry[@code='C']").evaluate(dom);
        System.out.println("nodes selected by exact match = " + selection1.size());
        
        List<Node> selection2 = new XPathWrapper("//entry[(@code >= 'B') and (@code <= 'D')]").evaluate(dom);
        System.out.println("nodes selected by comparison  = " + selection2.size());
    }
}

When you run this, the equality test in the first path returns a single element, the entry with id 1. The second expression, which use a range predicate, selects nothing — even though the expected code is within the specified range.

How to solve this problem? One approach is to use an XPath 2.0 implementation, such as Saxon. But, assuming that you're stuck with XPath 1.0, this is where a user-defined function will do the job:

public class XPathRange2
{
    private final static String xml = "<root>"
                                    +     "<entry id='1' code='A'>foo</entry>"
                                    +     "<entry id='2' code='A'>bar</entry>"
                                    +     "<entry id='3' code='C'>baz</entry>"
                                    +     "<entry id='4' code='E'>argle</entry>"
                                    +     "<entry id='4' code='E'>bargle</entry>" 
                                    + "</root>";
    
    private final static String FUNCTION_NS = "urn:uuid:" + UUID.randomUUID();
    
    private static class XPathRange implements XPathFunction
    {
        @Override
        @SuppressWarnings("rawtypes")
        public Object evaluate(List args) throws XPathFunctionException
        {
            try
            {
                java.util.Iterator argItx = args.iterator();
                NodeList selection = (NodeList)argItx.next();
                String lowerBound = (String)argItx.next();
                String upperBound = (String)argItx.next();
                
                String value = (selection.getLength() > 0) ? selection.item(0).getNodeValue() : "";
                return (value.compareTo(lowerBound) >= 0) && (value.compareTo(upperBound) <= 0);
            }
            catch (Exception ex)
            {
                throw new XPathFunctionException(ex);
            }
        }
    }
    

    public static void main(String[] argv) throws Exception
    {
        Document dom = ParseUtil.parse(xml);
        
        List<Node> selection1 = new XPathWrapper("//entry[@code='C']").evaluate(dom);
        System.out.println("nodes selected by exact match = " + selection1.size());
        
        List<Node> selection2 = new XPathWrapper("//entry[fn:inRange(@code, 'B', 'D')]")
                                .bindNamespace("fn", FUNCTION_NS)
                                .bindFunction(new QName(FUNCTION_NS, "inRange"), new XPathRange(), 3)
                                .evaluate(dom);
        System.out.println("nodes selected by function    = " + selection2.size());
    }
}

* All examples in this post use the Practical XML library to reduce boilerplate code.

Saturday, July 12, 2014

Hotspot In Action

Yesterday I was writing some code that used the output of System.currentTimeMillis() as a database key. Yeah, yeah, that's a really bad idea, I know. But I figured that it was going to be a low-volume operation, and I added retry logic, and … I started to wonder about the resolution of the system clock, as returned by System.currentTimeMillis().

Returns the current time in milliseconds. Note that while the unit of time of the return value is a millisecond, the granularity of the value depends on the underlying operating system and may be larger. For example, many operating systems measure time in units of tens of milliseconds.

So, having some time to fill during a build, I wrote a program to determine the granularity of my operating system(s).

public static void main(String[] argv)
throws Exception
{
    ConcurrentHashMap<Long,AtomicLong> counters = new ConcurrentHashMap<Long,AtomicLong>();
    long finish = System.currentTimeMillis() + 1000;
    long current = 0;
    while ((current = System.currentTimeMillis()) < finish)
    {
        incrementCounter(counters, current);
    }

    System.out.println("number of counters = " + counters.size());
    TreeMap<Long,AtomicLong> displayMap = new TreeMap<Long,AtomicLong>(counters);
    for (Map.Entry<Long,AtomicLong> entry : displayMap.entrySet())
    {
        System.out.println(entry.getKey() + " = " + entry.getValue().longValue());
    }
}

private static void incrementCounter(ConcurrentHashMap<Long,AtomicLong> counters, long current)
{
    Long key = new Long(current);
    AtomicLong counter = counters.get(key);
    if (counter == null)
    {
        counters.putIfAbsent(key, new AtomicLong());
        counter = counters.get(key);
    }
    counter.incrementAndGet();
}

I'm not sure why I decided to use a Map, rather than simply storing the distinct timestamps in a Set (and yes, rather than roll my own counter map, I should have used an existing implementation). But the map gave me some interesting results:

number of counters = 1000
1405105663691 = 805
1405105663692 = 775
1405105663693 = 808
1405105663694 = 796
1405105663695 = 799
1405105663696 = 789
1405105663697 = 821
1405105663698 = 820
1405105663699 = 818
1405105663700 = 792
1405105663701 = 788
1405105663702 = 821
1405105663703 = 740
1405105663704 = 1103
1405105663705 = 1246
1405105663706 = 1921
1405105663707 = 2959
1405105663708 = 5960
1405105663709 = 6242
1405105663710 = 12867
1405105663711 = 13331
1405105663712 = 12883
1405105663713 = 13436
1405105663714 = 13457
1405105663715 = 19267
1405105663716 = 20521
1405105663717 = 21728

OK, first thing that jumps out is that Linux provides 1ms resolution (Windows is 10). But far more interesting is the jump in the number of counts captured: from approximately 800 in the first dozen measurements, then increasing in a few sharp jumps, ending up in the 20,000 range and staying there for the rest of the samples. This is a dramatic demonstration of the power of Hotspot to optimize code on the fly. If you want to see what it's really doing, here's a version with diagnostics turned on:

> java -cp target/classes/ -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintCompilation experiments.ClockResolution
     45   1       java.lang.Object:: (1 bytes)
     46   2       java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
---   n   java.lang.System::currentTimeMillis (static)
---   n   sun.misc.Unsafe::compareAndSwapLong
     47   3       java.util.concurrent.ConcurrentHashMap::segmentFor (17 bytes)
     47   4       java.lang.Long::hashCode (14 bytes)
     47   5       java.lang.Number:: (5 bytes)
     47   6       java.util.concurrent.ConcurrentHashMap$Segment::get (66 bytes)
     48   7       java.util.concurrent.ConcurrentHashMap::get (19 bytes)
Inlining intrinsic _hashCode (virtual) at bci:1 in java.util.concurrent.ConcurrentHashMap::get (19 bytes)
     49   8       java.util.concurrent.ConcurrentHashMap$Segment::getFirst (14 bytes)
     49   9       experiments.ClockResolution::incrementCounter (54 bytes)
     50  10       java.lang.Long:: (10 bytes)
     50  11       java.lang.Long::equals (30 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     51  12       java.lang.Long::longValue (5 bytes)
     51  13       java.util.concurrent.atomic.AtomicLong::incrementAndGet (23 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     51  14       java.util.concurrent.atomic.AtomicLong::get (5 bytes)
     51  15       java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
     52   1%      experiments.ClockResolution::main @ 22 (159 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _currentTimeMillis at bci:28 in experiments.ClockResolution::main (159 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
Inlining intrinsic _compareAndSwapLong at bci:9 in java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
   1033   1%     made not entrant  experiments.ClockResolution::main @ -2 (159 bytes)
number of counters = 1000
...

Friday, July 11, 2014

Namespaces Aren't Just Attributes

Here's some code that builds an XML document and prints it (you'll find OutputUtil in the Practical XML library; it's a lot easier than configuring a serializer):

private static Document createDom() throws Exception
{
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setNamespaceAware(true);
    Document dom = dbf.newDocumentBuilder().newDocument();
    
    Element root = dom.createElement("data:root");
    dom.appendChild(root);
    root.setAttribute("xmlns:data", "http://www.example.com");
    root.setTextContent("foo");
    
    return dom;
}

public static void main(String[] argv)
throws Exception
{
    Document dom = createDom();
    System.out.println(OutputUtil.compactString(dom));
}

If you run this code, it produces the following output:

<data:root xmlns:data="http://www.example.com">foo</data:root>

Looks good; if you were using this output to debug a program, you might think that createDom() was doing the right thing. And you might be very confused to discover that your XPath expressions weren't returning anything (again, I'm using Practical XML to reduce boilerplate):

public static void main(String[] argv)
throws Exception
{
    Document dom = createDom();
    XPathWrapper xpath = new XPathWrapper("//data:root").bindNamespace("data", "http://www.example.com");
    System.out.println(xpath.evaluateAsString(dom));
}

At this point you might blame the library, and use the XPath implementation provided by the JDK. And you'd get the same result. To debug this problem, you need to be more creative:

public static void main(String[] argv)
throws Exception
{
    Document dom = createDom();
    Element root = dom.getDocumentElement();
    System.out.println("nodeName      = " + root.getNodeName());
    System.out.println("namespace URI = " + root.getNamespaceURI());
    System.out.println("prefix        = " + root.getPrefix());
    System.out.println("localname     = " + root.getLocalName());
}

Run this and you'll see the following output:

nodeName      = data:root
namespace URI = null
prefix        = null
localname     = null

What's happening here is that creating an xmlns attribute doesn't affect a node's namespace. The DOM — in Java and other language bindings — requires you to create the element with an explicit namespace-aware method:

Element root = dom.createElementNS("http://www.example.com", "data:root");

But what do you do if the createDom() method was written by some other team, and they don't want to listen to you telling them they're doing things wrong? If you can't get them fired, you can always serialize and reparse the XML:

public static void main(String[] argv)
throws Exception
{
    Document dom1 = createDom();
    String xml = OutputUtil.compactString(dom1);
    Document dom2 = ParseUtil.parse(xml);
    
    Element root = dom2.getDocumentElement();
    System.out.println("nodeName      = " + root.getNodeName());
    System.out.println("namespace URI = " + root.getNamespaceURI());
    System.out.println("prefix        = " + root.getPrefix());
    System.out.println("localname     = " + root.getLocalName());
}