Friday, July 11, 2014

Namespaces Aren't Just Attributes

Here's some code that builds an XML document and prints it (you'll find OutputUtil in the Practical XML library; it's a lot easier than configuring a serializer):

private static Document createDom() throws Exception
{
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setNamespaceAware(true);
    Document dom = dbf.newDocumentBuilder().newDocument();
    
    Element root = dom.createElement("data:root");
    dom.appendChild(root);
    root.setAttribute("xmlns:data", "http://www.example.com");
    root.setTextContent("foo");
    
    return dom;
}

public static void main(String[] argv)
throws Exception
{
    Document dom = createDom();
    System.out.println(OutputUtil.compactString(dom));
}

If you run this code, it produces the following output:

<data:root xmlns:data="http://www.example.com">foo</data:root>

Looks good; if you were using this output to debug a program, you might think that createDom() was doing the right thing. And you might be very confused to discover that your XPath expressions weren't returning anything (again, I'm using Practical XML to reduce boilerplate):

public static void main(String[] argv)
throws Exception
{
    Document dom = createDom();
    XPathWrapper xpath = new XPathWrapper("//data:root").bindNamespace("data", "http://www.example.com");
    System.out.println(xpath.evaluateAsString(dom));
}

At this point you might blame the library, and use the XPath implementation provided by the JDK. And you'd get the same result. To debug this problem, you need to be more creative:

public static void main(String[] argv)
throws Exception
{
    Document dom = createDom();
    Element root = dom.getDocumentElement();
    System.out.println("nodeName      = " + root.getNodeName());
    System.out.println("namespace URI = " + root.getNamespaceURI());
    System.out.println("prefix        = " + root.getPrefix());
    System.out.println("localname     = " + root.getLocalName());
}

Run this and you'll see the following output:

nodeName      = data:root
namespace URI = null
prefix        = null
localname     = null

What's happening here is that creating an xmlns attribute doesn't affect a node's namespace. The DOM — in Java and other language bindings — requires you to create the element with an explicit namespace-aware method:

Element root = dom.createElementNS("http://www.example.com", "data:root");

But what do you do if the createDom() method was written by some other team, and they don't want to listen to you telling them they're doing things wrong? If you can't get them fired, you can always serialize and reparse the XML:

public static void main(String[] argv)
throws Exception
{
    Document dom1 = createDom();
    String xml = OutputUtil.compactString(dom1);
    Document dom2 = ParseUtil.parse(xml);
    
    Element root = dom2.getDocumentElement();
    System.out.println("nodeName      = " + root.getNodeName());
    System.out.println("namespace URI = " + root.getNamespaceURI());
    System.out.println("prefix        = " + root.getPrefix());
    System.out.println("localname     = " + root.getLocalName());
}

No comments: