Monday, October 27, 2014

JDK Bug (Maybe): Namespaced XPath Variables

I'm revising my XPath article, and stumbled onto some strange behavior with namespaced variables.

For those not intimately familiar with the XPath spec, a VariableReference is a dollar sign followed by a qualified name. A qualified name may include a namespace prefix, and the spec requires that prefixes be associated with namespace declarations.

Here are two example paths, one with a namespace prefix and one without.

    //baz[@name=$myname]
    //baz[@name=$ns:myname]

Seems simple enough, let's see some example code. SimpleNamespaceResolver is a class from the Practical XML library that manages a single-entry namespace context. I use it here because a complete and correct implementation of NamespaceContext would be a distraction.

import java.io.StringReader;

import javax.xml.namespace.QName;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathVariableResolver;

import org.w3c.dom.Document;

import org.xml.sax.InputSource;

import net.sf.practicalxml.xpath.SimpleNamespaceResolver;


public class VariableExample
{
    public static void main(String[] argv) throws Exception
    {
        String xml = "<foo>"
                   + "    <bar name='argle'>"
                   +          "Argle"
                   + "    </bar>"
                   + "    <bar name='bargle'>"
                   +          "Bargle"
                   + "        <baz>Baz</baz>"
                   + "    </bar>"
                   + "</foo>";
        
        Document dom = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader(xml)));

        XPath xpath = XPathFactory.newInstance().newXPath();
        xpath.setNamespaceContext(new SimpleNamespaceResolver("ns", "foo"));
        xpath.setXPathVariableResolver(new XPathVariableResolver()
        {
            @Override
            public Object resolveVariable(QName var)
            {
                System.out.println("prefix    = " + var.getPrefix());
                System.out.println("namespace = " + var.getNamespaceURI());
                System.out.println("localName = " + var.getLocalPart());
                if (var.getLocalPart().equals("name"))
                    return "argle";
                else
                    return "";
            }
        });        
        
        String result = xpath.evaluate("//bar[@name=$ns:name]", dom);
        
        System.out.println("selected content: \"" + result + "\"");
    }
}

When we run this, it produces the following output:

prefix    = 
namespace = foo
localName = name
selected content: "Argle    "

The prefix has disappeared, but that's OK: we have the namespace. However, what if we comment out the setNamespaceContext() call?

prefix    = 
namespace = ns
localName = name
selected content: "Argle    "

The prefix has now become the namespace, without an exception being thrown.

Is this a real problem? I searched the Java Bug Parade and Apache Bug Database and didn't see anyone reporting it as an issue, so have to assume the answer is “no.”

Perhaps nobody uses namespaced variable references in the real world. I think this is a good idea on principle: if you have so many variables that you need namespaces, your XPath expressions are probably too complex.

And given the possibility that a misspelling will cause your expressions to silently fail, I think it's a good idea in practice as well.