Monday, October 27, 2014

JDK Bug (Maybe): Namespaced XPath Variables

I'm revising my XPath article, and stumbled onto some strange behavior with namespaced variables.

For those not intimately familiar with the XPath spec, a VariableReference is a dollar sign followed by a qualified name. A qualified name may include a namespace prefix, and the spec requires that prefixes be associated with namespace declarations.

Here are two example paths, one with a namespace prefix and one without.

    //baz[@name=$myname]
    //baz[@name=$ns:myname]

Seems simple enough, let's see some example code. SimpleNamespaceResolver is a class from the Practical XML library that manages a single-entry namespace context. I use it here because a complete and correct implementation of NamespaceContext would be a distraction.

import java.io.StringReader;

import javax.xml.namespace.QName;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathVariableResolver;

import org.w3c.dom.Document;

import org.xml.sax.InputSource;

import net.sf.practicalxml.xpath.SimpleNamespaceResolver;


public class VariableExample
{
    public static void main(String[] argv) throws Exception
    {
        String xml = "<foo>"
                   + "    <bar name='argle'>"
                   +          "Argle"
                   + "    </bar>"
                   + "    <bar name='bargle'>"
                   +          "Bargle"
                   + "        <baz>Baz</baz>"
                   + "    </bar>"
                   + "</foo>";
        
        Document dom = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader(xml)));

        XPath xpath = XPathFactory.newInstance().newXPath();
        xpath.setNamespaceContext(new SimpleNamespaceResolver("ns", "foo"));
        xpath.setXPathVariableResolver(new XPathVariableResolver()
        {
            @Override
            public Object resolveVariable(QName var)
            {
                System.out.println("prefix    = " + var.getPrefix());
                System.out.println("namespace = " + var.getNamespaceURI());
                System.out.println("localName = " + var.getLocalPart());
                if (var.getLocalPart().equals("name"))
                    return "argle";
                else
                    return "";
            }
        });        
        
        String result = xpath.evaluate("//bar[@name=$ns:name]", dom);
        
        System.out.println("selected content: \"" + result + "\"");
    }
}

When we run this, it produces the following output:

prefix    = 
namespace = foo
localName = name
selected content: "Argle    "

The prefix has disappeared, but that's OK: we have the namespace. However, what if we comment out the setNamespaceContext() call?

prefix    = 
namespace = ns
localName = name
selected content: "Argle    "

The prefix has now become the namespace, without an exception being thrown.

Is this a real problem? I searched the Java Bug Parade and Apache Bug Database and didn't see anyone reporting it as an issue, so have to assume the answer is “no.”

Perhaps nobody uses namespaced variable references in the real world. I think this is a good idea on principle: if you have so many variables that you need namespaces, your XPath expressions are probably too complex.

And given the possibility that a misspelling will cause your expressions to silently fail, I think it's a good idea in practice as well.

Tuesday, August 12, 2014

Scalarific FizzBuzz

This started as a lunchtime conversation about interviewing. I'm a big fan of FizzBuzz as a “screening” question: it weeds out the people that shouldn't be allowed near a computer (and, after conducting several hundred interviews, I can say that there's a depressingly large number of them, especially at a company without a preliminary phone screen).

For a Scala developer, what constitutes a good FizzBuzz? Clearly, it should be based around higher-order functions, such as a map() (and, as I don't consider myself a Scala developer, I'll leave the parentheses and dots in place):

(1 to 20).map(???)

A simple implementation might use an if expression:

def fbi(x: Int): String = {
    if ((x % 15) == 0) "fizzbuzz"
    else if ((x % 3) == 0) "fizz"
    else if ((x % 5) == 0) "buzz"
    else s"$x"
}

It gets the job done, but looks too much like Java. We need to add some Scala-specific syntax:

def fbm(x: Int): String = x match {
    case n if ((n % 15) == 0) => "fizzbuzz"
    case n if ((n % 3) == 0) => "fizz"
    case n if ((n % 5) == 0) => "buzz"
    case n => s"$n"
}

You can argue whether this is better or worse. It seems to me that it just wraps the previous if with more cruft.* As our conversation devolved, though, it led to the following implementation, which is about as far from Java as I can imagine:

object Fizz {
    def unapply(x: Int): Boolean = ((x % 3) == 0)
}
  
object Buzz {
    def unapply(x: Int): Boolean = ((x % 5) == 0)
}
  
object FizzBuzz {
    def unapply(x: Int): Boolean = ((x % 15) == 0)
}
 
def fbme(x: Int): String = x match {
    case FizzBuzz() => "fizzbuzz"
    case Fizz() => "fizz"
    case Buzz() => "buzz"
    case n => s"$n"
}    

Mind you, I don't think I'd want to hire someone who implemented FizzBuzz this way.


* You'll find a nicer match-based implementation at RosettaCode. Along with some versions that make my extractor-based implementation look sane.

Friday, July 18, 2014

XPath String Comparisons

I recently learned something about XPath: relational operations don't compare strings.

When neither object to be compared is a node-set and the operator is <=, <, >= or >, then the objects are compared by converting both objects to numbers and comparing the numbers according to IEEE 754.

Here's an example program that demonstrates the problem: given a set of records, select those where a specific code is in the desired range:*

public class XPathRange1
{
    private final static String xml = "<root>"
                                    +     "<entry id='1' code='A'>foo</entry>"
                                    +     "<entry id='2' code='A'>bar</entry>"
                                    +     "<entry id='3' code='C'>baz</entry>"
                                    +     "<entry id='4' code='E'>argle</entry>"
                                    +     "<entry id='4' code='E'>bargle</entry>" 
                                    + "</root>";
    

    public static void main(String[] argv) throws Exception
    {
        Document dom = ParseUtil.parse(xml);
        
        List<Node> selection1 = new XPathWrapper("//entry[@code='C']").evaluate(dom);
        System.out.println("nodes selected by exact match = " + selection1.size());
        
        List<Node> selection2 = new XPathWrapper("//entry[(@code >= 'B') and (@code <= 'D')]").evaluate(dom);
        System.out.println("nodes selected by comparison  = " + selection2.size());
    }
}

When you run this, the equality test in the first path returns a single element, the entry with id 1. The second expression, which use a range predicate, selects nothing — even though the expected code is within the specified range.

How to solve this problem? One approach is to use an XPath 2.0 implementation, such as Saxon. But, assuming that you're stuck with XPath 1.0, this is where a user-defined function will do the job:

public class XPathRange2
{
    private final static String xml = "<root>"
                                    +     "<entry id='1' code='A'>foo</entry>"
                                    +     "<entry id='2' code='A'>bar</entry>"
                                    +     "<entry id='3' code='C'>baz</entry>"
                                    +     "<entry id='4' code='E'>argle</entry>"
                                    +     "<entry id='4' code='E'>bargle</entry>" 
                                    + "</root>";
    
    private final static String FUNCTION_NS = "urn:uuid:" + UUID.randomUUID();
    
    private static class XPathRange implements XPathFunction
    {
        @Override
        @SuppressWarnings("rawtypes")
        public Object evaluate(List args) throws XPathFunctionException
        {
            try
            {
                java.util.Iterator argItx = args.iterator();
                NodeList selection = (NodeList)argItx.next();
                String lowerBound = (String)argItx.next();
                String upperBound = (String)argItx.next();
                
                String value = (selection.getLength() > 0) ? selection.item(0).getNodeValue() : "";
                return (value.compareTo(lowerBound) >= 0) && (value.compareTo(upperBound) <= 0);
            }
            catch (Exception ex)
            {
                throw new XPathFunctionException(ex);
            }
        }
    }
    

    public static void main(String[] argv) throws Exception
    {
        Document dom = ParseUtil.parse(xml);
        
        List<Node> selection1 = new XPathWrapper("//entry[@code='C']").evaluate(dom);
        System.out.println("nodes selected by exact match = " + selection1.size());
        
        List<Node> selection2 = new XPathWrapper("//entry[fn:inRange(@code, 'B', 'D')]")
                                .bindNamespace("fn", FUNCTION_NS)
                                .bindFunction(new QName(FUNCTION_NS, "inRange"), new XPathRange(), 3)
                                .evaluate(dom);
        System.out.println("nodes selected by function    = " + selection2.size());
    }
}

* All examples in this post use the Practical XML library to reduce boilerplate code.