I was recently bitten by JDK bug #6337981. Since it took me a while to narrow down the problem, and longer to find the official bug report, I'm posting it here to hopefully increase its Googleability. And so that I can respond to any complaints about how Practical XML does output.
Here's the code:
public class XmlIndentExample { public static void main(String[] argv) throws Exception { String src = "<root><child>text</child></root>"; ByteArrayInputStream in1 = new ByteArrayInputStream(src.getBytes("UTF-8")); StringWriter out1 = new StringWriter(); transform(new StreamSource(in1), new StreamResult(out1)); String result1 = out1.toString(); System.out.println("writer:\n" + result1); ByteArrayInputStream in2 = new ByteArrayInputStream(src.getBytes("UTF-8")); ByteArrayOutputStream out2 = new ByteArrayOutputStream(); transform(new StreamSource(in2), new StreamResult(out2)); String result2 = new String(out2.toByteArray(), "UTF-8"); System.out.println("stream:\n" + result2); } private static void transform(Source source, Result result) throws Exception { TransformerFactory fact = TransformerFactory.newInstance(); // this is a work-around bug #6296446; only needed on JDK 1.5 fact.setAttribute("indent-number", Integer.valueOf(4)); Transformer xform = fact.newTransformer(); xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); xform.setOutputProperty(OutputKeys.INDENT, "yes"); // since we set the "indent-number" attribute on the factory, we // don't need to set the indent amount here; uncomment if you // think it will make a difference // xform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4"); xform.transform(source, result); } }
When you run this, the version that writes to a ByteArrayOutputStream
isn't indented, while the version that writes to a StringWriter
is.
The bug is marked as low priority, and has been open since 2005. And I was unable to find any mention of a similar problem in the Xerces bug database. All of which means that it's unlikely to get fixed any time soon.
There is a work-around, mentioned in the bug report: wrap the OutputStream
in an OutputStreamWriter
. That works, but you need to pay attention to encoding. Always — always — tell the OutputStreamWriter
to use UTF-8:
OutputStreamWriter wrapped = new OutputStreamWriter(out, "UTF-8");
No comments:
Post a Comment