Thursday, April 28, 2011

JDK Bug 6337981: Writing XML to OutputStream Doesn't Indent

I was recently bitten by JDK bug #6337981. Since it took me a while to narrow down the problem, and longer to find the official bug report, I'm posting it here to hopefully increase its Googleability. And so that I can respond to any complaints about how Practical XML does output.

Here's the code:

public class XmlIndentExample
{
    public static void main(String[] argv)
    throws Exception
    {
        String src = "<root><child>text</child></root>";

        ByteArrayInputStream in1 = new ByteArrayInputStream(src.getBytes("UTF-8"));
        StringWriter out1 = new StringWriter();
        transform(new StreamSource(in1), new StreamResult(out1));
        String result1 = out1.toString();
        System.out.println("writer:\n" + result1);

        ByteArrayInputStream in2 = new ByteArrayInputStream(src.getBytes("UTF-8"));
        ByteArrayOutputStream out2 = new ByteArrayOutputStream();
        transform(new StreamSource(in2), new StreamResult(out2));
        String result2 = new String(out2.toByteArray(), "UTF-8");
        System.out.println("stream:\n" + result2);
    }


    private static void transform(Source source, Result result)
    throws Exception
    {
        TransformerFactory fact = TransformerFactory.newInstance();

        // this is a work-around bug #6296446; only needed on JDK 1.5
        fact.setAttribute("indent-number", Integer.valueOf(4));

        Transformer xform = fact.newTransformer();

        xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        xform.setOutputProperty(OutputKeys.INDENT, "yes");

        // since we set the "indent-number" attribute on the factory, we
        // don't need to set the indent amount here; uncomment if you
        // think it will make a difference

//        xform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

        xform.transform(source, result);
    }
}

When you run this, the version that writes to a ByteArrayOutputStream isn't indented, while the version that writes to a StringWriter is.

The bug is marked as low priority, and has been open since 2005. And I was unable to find any mention of a similar problem in the Xerces bug database. All of which means that it's unlikely to get fixed any time soon.

There is a work-around, mentioned in the bug report: wrap the OutputStream in an OutputStreamWriter. That works, but you need to pay attention to encoding. Always — always — tell the OutputStreamWriter to use UTF-8:

OutputStreamWriter wrapped = new OutputStreamWriter(out, "UTF-8");

No comments: