Tuesday, April 7, 2009

JSP tag handlers can't have an attribute named “class”

Recently I've been implementing some tag handlers that emit HTML. In them, I want to carry through some of the standard HTML attributes such as id and class. And I've discovered that Tomcat is quite happy to accept the following TLD entry, but at runtime complains that it's unable to find a setter method for the attribute:

<attribute>
  <name>class</name>
  <rtexprvalue>true</rtexprvalue>

  <type>java.lang.String</type>
</attribute>

Sun provides a schema definition for the JSP taglib deployment descriptor, and it defines the type tld-attributeType for attribute declarations. Looking at that type definition, we see this:

<xsd:element name="name"
   type="j2ee:java-identifierType"/>

And jumping to the common definitions schema, here's the definition of java-identifierType:

<xsd:complexType name="java-identifierType">
  <xsd:annotation>
    <xsd:documentation>

        The java-identifierType defines a Java identifier.
        The users of this type should further verify that
        the content does not contain Java reserved keywords.

    </xsd:documentation>
  </xsd:annotation>

  <xsd:simpleContent>
    <xsd:restriction base="j2ee:string">
      <xsd:pattern value="($|_|\p{L})(\p{L}|\p{Nd}|_|$)*"/>
    </xsd:restriction>
  </xsd:simpleContent>

</xsd:complexType>

Did you read the comment? That users of the type should perform keyword validation?!? I couldn't believe it, until I turned to the XML Schema docs and discovered that there's no way to exclude values: the enumeration facet enumerates legal values only.

So, two lessons to draw from this: first, XML schema is not only complex but incomplete, and second, you can't use class as a JSP tag attribute. Or any other Java keyword for that matter. At least not in a servlet container developed by people who have read the schema docs, because this restriction is not mentioned in the JSP specification text.

My solution? The ugly but descriptive htmlClass.

(originally posted on my website 19 Nov 08)

Strings sharing backing arrays

For some reason, I thought this bug was fixed years ago … it was reported in 2001, after all. Yet today I was poking around in the JDK 1.5 source, and what do I find?

return ((beginIndex == 0) && (endIndex == count)) ? this :
    new String(offset + beginIndex, endIndex - beginIndex, value);

This particular string constructor is package-private, and exists solely for making copies of strings with the same backing array:

String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}

So why is this important? Let's say that you're extracting two fields from a large file of fixed-length lines of text. The file contains a gigabyte of data, millions of lines, but you figure you're OK because you're only saving 10 characters per line:

while ((line = reader.readLine()) != null)
{
    values.put(line.substring(0, 5), line.substring(17, 23));
}

Will you be surprised by the OutOfMemoryError? It happens because those substring calls don't actually extract characters from the large string. Instead, they create new strings that reference the same character array. Once you put them in the map, that character array becomes ineligible for collection, even though line is. The solution is one of the few cases where you should call a string constructor directly:

while ((line = reader.readLine()) != null)
{
    String s1 = new String(line.substring(0, 5));
    String s2 = new String(line.substring(17, 23));
    values.put(s1, s2);
}

This constructor, which seems to do nothing other than copy the string, actually has a lot of logic inside: if it determines that the source string has a backing array that's larger than it needs, it creates a new backing array. In this example, the new backing arrays allows the array backing line to go out of scope and be collected. No more OutOfMemoryError.

Of course, this leaves the question: when and where should you use this technique? There are, of course, lots of cases where you'll know that you're working with small strings, or will be breaking a large string into pieces and keeping all of the pieces. In these cases, the default behavior makes sense.

But in cases where you don't know what will become of the substring — such as when you're writing a utility method — by all means construct a new string. Object creation, we're told, is extremely fast. Is it worth running out of memory to save a few cycles?