I was recently bitten by JDK bug #6337981. Since it took me a while to narrow down the problem, and longer to find the official bug report, I'm posting it here to hopefully increase its Googleability. And so that I can respond to any complaints about how Practical XML does output.
Here's the code:
public class XmlIndentExample
{
public static void main(String[] argv)
throws Exception
{
String src = "<root><child>text</child></root>";
ByteArrayInputStream in1 = new ByteArrayInputStream(src.getBytes("UTF-8"));
StringWriter out1 = new StringWriter();
transform(new StreamSource(in1), new StreamResult(out1));
String result1 = out1.toString();
System.out.println("writer:\n" + result1);
ByteArrayInputStream in2 = new ByteArrayInputStream(src.getBytes("UTF-8"));
ByteArrayOutputStream out2 = new ByteArrayOutputStream();
transform(new StreamSource(in2), new StreamResult(out2));
String result2 = new String(out2.toByteArray(), "UTF-8");
System.out.println("stream:\n" + result2);
}
private static void transform(Source source, Result result)
throws Exception
{
TransformerFactory fact = TransformerFactory.newInstance();
// this is a work-around bug #6296446; only needed on JDK 1.5
fact.setAttribute("indent-number", Integer.valueOf(4));
Transformer xform = fact.newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.setOutputProperty(OutputKeys.INDENT, "yes");
// since we set the "indent-number" attribute on the factory, we
// don't need to set the indent amount here; uncomment if you
// think it will make a difference
// xform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
xform.transform(source, result);
}
}
When you run this, the version that writes to a ByteArrayOutputStream isn't indented, while the version that writes to a StringWriter is.
The bug is marked as low priority, and has been open since 2005. And I was unable to find any mention of a similar problem in the Xerces bug database. All of which means that it's unlikely to get fixed any time soon.
There is a work-around, mentioned in the bug report: wrap the OutputStream in an OutputStreamWriter. That works, but you need to pay attention to encoding. Always — always — tell the OutputStreamWriter to use UTF-8:
OutputStreamWriter wrapped = new OutputStreamWriter(out, "UTF-8");
No comments:
Post a Comment