blog.kdgregory.com: August 2011

Wednesday, August 31, 2011

Using a Local Repository Server with Gradle

I've been doing a little work with Gradle recently. And one of the things that I find “less than optimal” is that the build script holds far too much knowledge about its environment, which means that you have to jump through some hoops to make those scripts portable. Not a huge problem for in-house development, but if you're making an open-source library, you don't want everyone else to reconfigure their world to match yours.

One particular problem is how to find your dependencies. A typical build script has a repositories section that lists all the places to look. Here's a simple example, that looks first in the local Maven repository, followed by the Maven Central:

repositories {
    mavenLocal()
    mavenCentral()
}

This is a portable build script — although I have no idea how dependencies might find their way to the local Maven repository, since Gradle uses its own dependency cache. A better build script might want to use a local repository server rather than constantly hitting Maven Central:

repositories {
    mavenRepo urls: 'http://intranet.example.com/repository'
}

That works, but now you can't share the build script with anybody else, unless they edit the script to use their own repository server (assuming they have one), and remember not to check in their changes. The solution that I came up with is to store the repository URL in $HOME/.gradle/gradle.properties, which is loaded for every build.

internalRepositoryUrl: http://repo.traffic.com:8081/nexus/content/groups/public/

Then, the build script is configured to add the local server only if the property is defined:

repositories {
    mavenLocal()

    if (project.hasProperty('internalRepositoryUrl') )
        mavenRepo urls: project.internalRepositoryUrl
    else
        mavenCentral()
}

It's portable, but it's ugly. When searching for solutions, I saw a couple of postings indicating that gradle.properties will eventually be allowed to contain expressions as well as properties. That day can't come soon enough.

Wednesday, August 17, 2011

Meta Content-Type is a Bad Idea

Following last week's posting about “text files,” I wanted to look at one of the most common ways to deliver text: the web. The HTTP protocol defines a Content-Type header, which specifies how a user agent (read: browser) should interpret the response body. The content type of an HTML document is text/html; breaking from other “text” types, its default character set is ISO-8859-1. However, you can specify the document's encoding as part of the Content-Type, and most websites do.

All well and good, except that an HTML document can specify its own encoding, using the http-equiv meta tag:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="fr" dir="ltr" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Wikipédia, l'encyclopédie libre</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Wikipedia does “meta” Content-Type about as well as you can: the page is delivered with a Content-Type header specifying UTF-8, and it's an XHTML document (which implies UTF-8 encoding in the absence of a prologue). The only questionable practice with this page is the location of the <title> tag: it contains UTF-8 content, but appears before the in-document Content-Type. But in this case the in-document content type specification is superfluous.

Not all non-English pages do as well. The Montreal Craigslist page, for example, specifies ISO-8859-1 in the HTTP response, but UTF-8 in the meta tag.^* It is a testament to browser developers adhering to Postel's Law that you can read the site at all.

From a “layered architecture” perspective, the embedded content-type declaration is ugly. You could argue that it self-describes a stand-alone document, much like the prologue in an XML document. But there's an important difference: the bytes of an XML prologue are rigidly specified; the parser doesn't need to know the encoding to read them. The <meta> tag can appear anywhere in the <head> of an HTML document. Including, as shown by Wikipedia, after content that requires knowledge of the encoding.

While writing this post, I did a quick search for a history of the embedded Content-Type specification. I turned up a W3C page that recommended always using it, but did not give a rationale. And I found a page that claimed specifying a character set in the HTTP response would “break older browsers.” As the page did not list those browsers, and did not appear to be written by someone involved in browser development, I'm not sure that I believe it.

For my personal website, I rely on the HTTP header, and don't use the meta tag. But I also limit myself to US-ASCII text, with HTML or numeric entities for anything that isn't ASCII. I'm not going to suggest that you remove the tag from your website (who knows, your biggest customer might have an “older browser”). But if you do use it, it should be the first thing in your <head>.

More important than whether the <meta> tag is present is that you actually get the encoding right, both in the page and in the HTTP headers.

With servlets, it's easy: the first line of your service method should be a call to ServletResponse.setContentType().

response.setContentType("text/html;charset=UTF-8");

This will set the Content-Type header and also configure the object returned by ServletResponse.getWriter(). Don't, under any circumstances, write HTML data via the object returned by ServletResponse.getOutputStream(); it exists for servlets that produce binary content.

With JSP, put the following two directives at the top of each page.

<%@page contentType="text/html"%>
<%@page pageEncoding="UTF-8"%>

These are translated into a call to ServletResponse.setContentType(), and are also used by the JSP container itself to parse the page. If, after reading this posting, you don't feel comfortable writing self-describing files, you can also use a JSP property group in your web.xml.

One final thing: if you do choose to specify content type via http-equiv, make sure that it matches what your server is putting in the HTTP response. Otherwise, you risk having your site used as an example by someone writing about encodings.

* The Paris Craigslist omits the <meta> declaration, but retains ISO-8859-1 in the HTTP response. Which explains why all of the ads say “EUR” rather than €.

Friday, August 12, 2011

"Text File" is an Oxymoron

Back in the early 1990s, life was easy. If you worked in the United States, “text” meant ASCII. If you worked in Canada or Europe, it might mean with ISO-8859-1 or windows-1252, but they were almost the same thing … unless you dealt with currency and needed to display the new Euro symbol. There were a few specialists that thought of text as wchar_t, but they were rare. Companies hired them as contractors rather than full-time employees.

This US-centric view of text is pervasive: any MIME Content-Type that begins with “text” is presumed to be US-ASCII unless it has an explicit character set specifier. Which often trips up people who create XML, which presumes UTF-8 in the absence of an explicit encoding (solution: use application/xml rather than text/xml).

This was the world that Java entered, and it left an indelible imprint. Internally, Java looked to the future, managing strings as Unicode (now UCS-2). But in the IO package, it was firmly rooted in the past, relying on “default encoding” when converting those two-byte Unicode characters into bytes. Even today, in JDK 7, FileReader and FileWriter don't support explicit encodings.

The trouble with a default encoding is that it changes from machine to machine. On my Linux machines, it's UTF-8; on my Windows XP machine at home, it's windows-1252; on my Windows XP machine from work, it's iso-8859-1. Which means that I can only move “text” files between these boxes if they're limited to US-ASCII characters. Not a problem for me, personally, but I work with people from all over the world.

At this point in time, I think the whole idea of “text” is obsolete. There's just a stream of bytes with some encoding applied. To read that stream in Java, use InputStreamReader with an explicit encoding; to write it, use OutputStreamWriter. Or, if you have a library that manages encoding for you, stick with streams.

If you're not doing that, you're doing it wrong. And if you aren't using UTF-8 as the encoding, in my opinion you're doing it poorly.

Wednesday, August 10, 2011

Defining Done

It seems that a lot of Agile teams have a problem defining when a task or story is “done.” I've seen that on the teams that I've worked with, occasionally leading to heated argument at the end of a sprint.

For stories, the definition of done is simple: it's done when the product owner says it is. That may horrify people who live and die by acceptance criteria, but the simple fact is that acceptance criteria are fluid. New acceptance criteria are often discovered after the story is committed, and although most of these should spur the creation of a new story, more likely is that they simply get added to the existing one. And sometimes (not often enough), the product owner decides that the story is “good enough.”

At the task level, however, there are no acceptance criteria. Most teams that I've seen have worked out some measure of doneness that involves test coverage and code reviews. But the problem with such criterial is that they don't actually speak to what the task is trying to accomplish. The code for a task could be 100% covered and peer reviewed, but contribute nothing to the product. I think this is especially likely in teams where individual members go off and work on their own tasks, because peer reviews in that situation tend to be technical rather than holistic. As long as the code looks right, it gets a pass.

In my experience, the last chance for holistic review is the sprint planning meeting. Unfortunately, by the time the team gets to tasking, it's often late in the day and everyone wants to go home. But I've found that by simply asking “how do you plan to test this,” the task descriptions get more exact, and — not surprisingly — the time estimates go up.

Saturday, August 6, 2011

The Horizontal Slice

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

That's one of the basic principles of the Agile Manifesto, and a common approach to satisfying it is the “horizontal slice&rdquo: a complete application, which takes its inputs from real sources and produces outputs that are consumed by real destinations. The application starts life as a bare skeleton, and each release cycle adds functionality.

In theory, at least, there are a lot of benefits to this approach. First and foremost is the “for tomorrow we ship” ethos that a partially-functioning application is better than no application at all. Second, it allows the team to work out the internal structure of the application, avoiding the “oops!” that usually accompanies integration of components developed in isolation. And not least, it keeps the entire team engaged: there's enough work for everyone, without stepping on each others' toes.

But after two recent green-field projects that used this approach, I think there are some drawbacks that outweigh these benefits.

The first is an over-reliance on those “real” sources and sinks; the development team is stuck if they become unavailable. And this happens a lot in a typical development or integration environment, because other teams are doing the same thing. Developing mock implementations is one way to avoid this problem, but convincing a product owner to spend time on mocks when real data is available is an exercise in futility.

The second problem is that software development proceeds in a quantum fashion. I've written about this with regards to unit testing, but it applies even more to complete projects. There's a lot of groundwork that's needed to make a real-world application. Days, perhaps weeks, go by without anything that could be called “functional”; everything is run from JUnit. And then, suddenly, the's a main(), and the application itself exists. Forcing this process into a two-week sprint cycle encourages programmers to hack together whatever is needed to make a demo, without concern for the long term.

And that results in the third problem — and in my opinion the worst: high coupling between components. When you develop a horizontal slice, I think there's less incentive to focus on unit tests, and more to focus on end-to-end tests. After all, that's how you're being judged, and if you get the same level of coverage what does it matter?

On the surface, that's a reasonable argument, but unit tests and integration tests have different goals: the latter test functionality, the former lead you to a better design. If you don't have to test your classes in isolation, it's all to easy to rely on services provided by other parts of the application. The result is a barrier to long-term maintenance, which is where most of a team's development effort is spent.

So is there a solution? The best that I can think of is working backwards: creating a module at a time, that produces real, consumable outputs from mock inputs. These modules don't have to be full-featured, and if fact shouldn't be: the goal is to get something that is well-designed. I think that working backwards gives you a much better design than working forwards because at every stage you know what the downstream stage needs, even if those needs change.

I want to say again that this approach is only for building on the green field. To maintain the building metaphor, it's establishing a foundation for the complete system, on which you add stories (pun intended).

blog.kdgregory.com