Friday, July 18, 2014

XPath String Comparisons

I recently learned something about XPath: relational operations don't compare strings.

When neither object to be compared is a node-set and the operator is <=, <, >= or >, then the objects are compared by converting both objects to numbers and comparing the numbers according to IEEE 754.

Here's an example program that demonstrates the problem: given a set of records, select those where a specific code is in the desired range:*

public class XPathRange1
{
    private final static String xml = "<root>"
                                    +     "<entry id='1' code='A'>foo</entry>"
                                    +     "<entry id='2' code='A'>bar</entry>"
                                    +     "<entry id='3' code='C'>baz</entry>"
                                    +     "<entry id='4' code='E'>argle</entry>"
                                    +     "<entry id='4' code='E'>bargle</entry>" 
                                    + "</root>";
    

    public static void main(String[] argv) throws Exception
    {
        Document dom = ParseUtil.parse(xml);
        
        List<Node> selection1 = new XPathWrapper("//entry[@code='C']").evaluate(dom);
        System.out.println("nodes selected by exact match = " + selection1.size());
        
        List<Node> selection2 = new XPathWrapper("//entry[(@code >= 'B') and (@code <= 'D')]").evaluate(dom);
        System.out.println("nodes selected by comparison  = " + selection2.size());
    }
}

When you run this, the equality test in the first path returns a single element, the entry with id 1. The second expression, which use a range predicate, selects nothing — even though the expected code is within the specified range.

How to solve this problem? One approach is to use an XPath 2.0 implementation, such as Saxon. But, assuming that you're stuck with XPath 1.0, this is where a user-defined function will do the job:

public class XPathRange2
{
    private final static String xml = "<root>"
                                    +     "<entry id='1' code='A'>foo</entry>"
                                    +     "<entry id='2' code='A'>bar</entry>"
                                    +     "<entry id='3' code='C'>baz</entry>"
                                    +     "<entry id='4' code='E'>argle</entry>"
                                    +     "<entry id='4' code='E'>bargle</entry>" 
                                    + "</root>";
    
    private final static String FUNCTION_NS = "urn:uuid:" + UUID.randomUUID();
    
    private static class XPathRange implements XPathFunction
    {
        @Override
        @SuppressWarnings("rawtypes")
        public Object evaluate(List args) throws XPathFunctionException
        {
            try
            {
                java.util.Iterator argItx = args.iterator();
                NodeList selection = (NodeList)argItx.next();
                String lowerBound = (String)argItx.next();
                String upperBound = (String)argItx.next();
                
                String value = (selection.getLength() > 0) ? selection.item(0).getNodeValue() : "";
                return (value.compareTo(lowerBound) >= 0) && (value.compareTo(upperBound) <= 0);
            }
            catch (Exception ex)
            {
                throw new XPathFunctionException(ex);
            }
        }
    }
    

    public static void main(String[] argv) throws Exception
    {
        Document dom = ParseUtil.parse(xml);
        
        List<Node> selection1 = new XPathWrapper("//entry[@code='C']").evaluate(dom);
        System.out.println("nodes selected by exact match = " + selection1.size());
        
        List<Node> selection2 = new XPathWrapper("//entry[fn:inRange(@code, 'B', 'D')]")
                                .bindNamespace("fn", FUNCTION_NS)
                                .bindFunction(new QName(FUNCTION_NS, "inRange"), new XPathRange(), 3)
                                .evaluate(dom);
        System.out.println("nodes selected by function    = " + selection2.size());
    }
}

* All examples in this post use the Practical XML library to reduce boilerplate code.

No comments: