Monday, August 31, 2009

Building a Wishlist Service: The Development Environment

Working software is the primary measure of progress

There's a great feedback loop from seeing your code run in situ. Even if you're religious about writing tests, the green bar doesn't match up to pressing a “Submit” button. I think it's one of the reasons that interactive languages like Python have such a devoted following. And a big part of this feeling is the thought that “I could demo this to someone who doesn't know how to code.” Or, as you get closer to the end of the project, “I could ship this.”

This feedback loop works even if you're a one-person team, which is why I started my wishlist project with the file demo.jsp. At first this page held a hard-coded form, which allowed me to invoke the skeleton of a servlet class. From there to documentation: writing up my thoughts on the service API. And then implementing those thoughts, in data transfer objects, JSP taglibs, and more servlet code. It's a circular process: update the JSP, update the servlet, update the docs. Of course, there are tests supporting all the servlet changes, but that web page stays on the screen, ready to submit whatever form data I need.

My development tools help with this process. Eclipse is my primary IDE: I like its editor, debugger, and refactoring tools. But I don't like its support for web applications (to be fair, the last time I tried it out was four years ago; things may have gotten better). For the web-app side, I use NetBeans 5.5, which comes bundled with a Tomcat server (6.5 gives pretty much the same environment but uses an external server; there's a level of comfort from using tools released together). I can build and deploy in a few seconds, then switch right back to either editor to make changes.

For production packaging, of course, I'll need to move to a Maven build. NetBeans is a nice rapid prototyping tool, but I don't like the way that it forces you to use its own Ant scripts. But that's a ways down the road, and Subversion should help me keep my sanity as I break the pieces into three separate build projects. For now, my prototyping environment is rapid enough that I don't feel the need for an interactive language.

Friday, August 28, 2009

Design versus Implementation

My last few postings have discussed the design of my wishlist service in very broad strokes: here's what I want to accomplish, here are a few of the issues that I've thought about. They have no detail; they don't even look at choice of platform. To me, this is the essence of the divide between design and implementation.

I see it as analogous to traditional building architecture, where the lead architect sketches broad forms, the rest of the architecture studio fleshes those out as models, and the mechanical engineers make it all work. It's also similar to a corporate mission statement — at least, a mission statement for a corporation that isn't floundering in indecision.

If the design is complete, then the implementation should flow naturally. My design specifies a RESTful interface using either XML or form-encoded data in the request; this implies that my implementation will have a front-end layer to transform these inputs into a common form. The design calls for separation between base product data and any application specific data; the data model will reflect that, and the data access code won't need to perform joins or multiple queries to load objects.

There are decisions that don't derive from the design: for example, my choice of Java and Servlets. This decision is one of convenience: it minimizes the amount of stuff that I have to learn. Other choices include Java/Spring, Python/Django, and Ruby/Rails. While I have some familiarity with Python, and could learn Ruby in short order, any code that I write with those would not be idiomatic — it would probably look like Java with a different syntax. Eventually, I plan to use the service as a basis for exploring Python and Ruby in an idiomatic way, but the first pass should leverage what I know.

This dividing line still leaves a lot of implementation decisions to be made, and the next several postings will look at those decisions and the implementation process itself.

Thursday, August 27, 2009

Designing a Wishlist Service: Scalability

One of the repeated themes in Founders at Work is how startups were unprepared for scalability issues, particularly in the cases where the company experienced viral growth. In some companies, the founders worked around the clock adding capacity. Viral growth is by its nature unpredictable: you have no idea how fast — or how long — your company will keep growing, and going in you have no idea who is going to like the product. The rest of us, however, should have an intended audience for our product, and can make design decisions based on that audience.

There's a lot of conventional wisdom on designing for scalability, most of which boils down to “cache everything” and “don't maintain session state.” The trouble with conventional wisdom is that it rarely applies to the real world. Caching, for example, is useless if you're constantly changing the data, or if a particular data item is accessed infrequently. And a RESTful web service doesn't store session state by its very nature.

So what usage pattern should we expect from a wishlist service? I'm designing for three types of user:

Wishlist Owner
This person will add and remove items, and perhaps change their rankings. These interactions all happen on human timescales (ie, several seconds or minutes between changes, then long periods of inactivity), and should involve more writes that reads. For this user, caching is not going to help performance: the list will probably age out of the cache before it's read again, and would need to be constantly purged as items change.
Friends and Family
Wishlist owners may choose to share their wishlist, in the hopes that other people will do the buying. On the surface, it appears that caching might be a good idea: after all, there may be a lot of people accessing the service. But again, we're looking at human timescales: the list will be shared by passing a link, probably in an email. While you may send such an email to a hundred people, they're not going to open it at the same time, so there's no point in caching.
Collaborative Shoppers
One of the original triggers for this project was a collaborative shopping tool, where several people would share a list and make updates (typically comments), using a chat service to broadcast changes. This is one situation where caching would be useful, since there will be a group of people hitting the same data at the same time. They'll also be updating the data constantly, meaning the lifetime of a cached object would be very short.

However, I think that the solution for this scenario is to use caching in front of the application rather than as part of it: either Apache's mod_cache or some sort of caching filter in the application server. To make this work, we need to use URLs that identify the version of the list along with other key data — but this is the RESTful approach anyway.

By coming up with this picture, I can implement the service as a simple database front-end; I don't need to waste time writing explicit code for scalability. And if it goes viral … well, let's just say I don't expect that, particularly with an eCommerce server gating access.

Monday, August 17, 2009

Designing a Wishlist Service: Security

Security is the red-headed stepchild of the web-service world. Sure, there's WS-Security, which may be useful if you're using SOAP in server-server communication. But most services, in particular, browser-accessed services, rely on the security mechanisms provided by HTTP and HTTPS. If you want authentication, you use a 401 response and expect the requester to provide credentials in subsequent requests. If you're paranoid, you use a client certificate.

The wishlist service doesn't quite fit into this model. There's a clear need for some form of authorization and control, if only to protect the service from script kiddies with too much time on their hands. Yet authentication has to be transparent for usability: we don't want to pop up login dialogs, particularly if the customer has already logged in to their eCommerce account.

Perhaps the biggest constraint on any security mechanism is that requests will come from a browser. This means that the service can't expect the requester to provide any secret information, because anybody can view the page or script (I'm amazed by just how many web services require a “secret key” in their URL). If a page needs to provide secret information, that information must be encrypted before it is delivered to the user.

And secret information is required, because there are different levels of authorization: some people can update a list, others can only read it. The solution that I chose is to encrypt the wishlist key and user's authorization level, and pass that to the service as a URL parameter. This does represent another departure from pure REST, in that there may be multiple URLs that refer to the same underlying resource, but it seems the most reasonable compromise.

It also has the limitation that anybody who has a URL has the access defined by that URL. In some web services, this would be a problem. In the case of the wishlist service, it's mitigated by several factors.

First among these is that the value of this data just isn't that high. Sure, if you plan to be nominated for the Supreme Court, you probably don't want to keep a wishlist of X-rated videos. But for most people, a wishlist of clothing just isn't that important (and an opt-in warning can scare away those who feel differently). A related factor is that, in normal usage, people won't be handing out these URLs — at least, not URLs that can make changes. There's simply no reason to do so.

A second mitigating factor is that each URL also has an encrypted timeout: the service will reject any requests that arrive after that timeout. While this can be used to exert control over shared lists, it is primarily intended to defeat script kiddies who might try a denial-of-service attack via constant updates.

A third mitigating factor is that we keep close control over secrets, particularly those that can be used to break the encryption — in other words, security through obscurity. Bletchely Park would not have been nearly so successful at breaking Enigma if they hadn't learned to recognize radio operators. Taking heed of that, none of the information used by the encrypted parameter is provided in plaintext, meaning that any attempts at discovering the key will require heuristic analysis rather than a simple String.contains().

Friday, August 14, 2009

Designing a Wishlist Service: Interface

One thing that has always struck me about web services is how hard they are to invoke from a web browser. SOAP requests are the worst, with the overhead from its envelope, but even simple XML services leave a lot to be desired. Particularly since Internet Explorer, the most popular browser currently in use, doesn't support the E4X extensions to JavaScript. This has led to diverging paths: either a web service is expected to be used in a browser, and is built around POST data and JSON, or it's meant to be called from another server, and is built around an XML protocol.

For the product list service, I wanted both. My main use case is browser-based requests, but I also want to support Flex application (which prefer XML) as well as server-server requests (such as managing a cart). This led me down the path of REST services.

“REST” seems to be applied to any web service that uses XML and isn't SOAP, so before continuing I should state what it means to me:

  1. GET is used for retrieval, POST for update. REST purists will now be upset with me, saying “what about PUT and DELETE?!?” My answer: when HTML supports PUT and DELETE as form methods, and every browser allows them, then we can talk about using them. Until that time, practice trumps theory.
  2. The URL uniquely and completely identifies the resource and action. Again, REST purists will object to including the action in the URL, but that's a necessity if we can't use PUT and DELETE.
  3. The request and response bodies contain only data. Far too many “RESTful” web services are really XML-RPC in disguise, particularly for update operations.
  4. HTTP status codes are used to signal success or failure. For any response other than 200 (Success), the response body will contain error information.

The result is a very simple interface. My desire to support both POST/JSON and XML requests introduces a slight wrinkle, one that's resolved front-end code that identifies the content type and converts to and from a JavaBean representation.

One thing that REST doesn't contemplate, however, is security. That's the next post.

Thursday, August 13, 2009

Designing a Wishlist Service: Feature Set

I've been referring to this project as a “wishlist,” but it's really a “product” list. Product lists appear everywhere in eCommerce, with different names and purposes: wishlists and registries keep track of products that we'd like others to buy for us; favorites and re-order lists keep track of products that we like to buy for ourselves, and the cart holds products that we're buying right now. Sometimes these lists will be shared with others, sometimes they won't. Some of these applications have different data needs: quantity on the registry and cart, versus ranking on the wishlist and favorites. From a design perspective, which features are important and which aren't? Particularly if I'm planning to sell this product to multiple sites?

Before we can think about what data gets stored, however, we need to think about how the list itself will be identified. One approach is to give each list a unique identifier, and have all requests use that identifier. I rejected this approach, because it requires any potential clients to modify their database schema, adding in a lookup table to associate this unique ID to a customer or whatever data they use to identify the list.

Instead, I decided to use a two-part key of customer ID and wishlist name. Every eCommerce site has some notion of a customer ID, so I'm reusing existing data. By giving wishlists names, access to a particular wishlist can be controlled by page-level code — no need to modify the database. These two key components are managed as character strings, with no interpretation by the service: as long as the site uses reasonable-length identifiers, they will work.

The next question is how to identify each product entry. Here I went with another composite key: product ID and sub-product ID. This is a nod to my past experience at GSI, which used a second-level ID to identify the particular size/color of a base product. Looking at other clothing sites, this appears to be a common practice, and sites that don't use such an ID can simply leave the field empty. Again, the values are defined as strings, without any interpretation by the service.

Finally, we get to the information associated with each product. And here I decided that the least information stored, the better: each entry has fields for rank and quantity, nothing else. This is an application of the Agile YAGNI principle (you ain't gonna need it): I've identified cases where these two fields will be used, but don't know what others might be needed. I do, however, recognize that other data might be needed, so attached a free-form string (CLOB) to each entry: should a particular site want to store data there, they can. It won't be interpreted by the service, but will be persisted with the rest of the entry.

Entries will also have a list of comments attached to them. This feature was driven by a specific collaborative shopping application, although I think it will be useful in multiple places (for example, a shared favorites list in which the customer describes why the product is a favorite). These comments are not associated with the product per-se, they are managed via separate service calls.

This brings me to the most interesting (to me) feature of all: how the client will access the service. That deserves a posting of its own.

Monday, August 10, 2009

Why a Wishlist?

From an eCommerce perspective, wishlists are far from the leading edge. Does the world need another one, and more important, is it marketable?

There are several answers to this. While wishlists (or registries) have been a part of eCommerce for many years, it's surprising how few sites actually use them. Perhaps this is because sites tried out a wishlist, discovered that it didn't measurably add to sales, and got rid of it. Yet Amazon continues to enhance its wishlist offering. And in my former role, I talked with many clients that wanted features that our wishlist offering didn't have.

Chief among these desired features was collaboration: not only sharing a list, but allowing other people to update or comment on it. This leads to some interesting questions about managing privacy and control, and I didn't see good answers to those questions in existing products. I think I've got those answers.

Existing wishlist implementations are also tightly integrated with their host site, typically (as with Amazon) accessed via dedicated pages. However, the way that people approach eCommerce is changing: from browser-based collaborative shopping, to the opening of new channels such as social networking sites. To me, the future will be based on federated services, accessed via rich browser apps, and my wishlist service is squarely in that space.

And finally, there's a personal angle: although I've been working with the J2EE stack for 10 years, and even co-developed a servlet-enabled web-server, I've never implemented a production web-application “from scratch.” This project is a way to cement my knowledge, and also produce a template app that can be adapted for other purposes.

Sunday, August 9, 2009

The Paradox of Free Time

It's been a month since my last blog posting. That's a common thing with technology bloggers big and small, often followed by an apologetic posting about how busy work has become. But I'm unemployed, so can't use that excuse … except that's exactly what happened!

OK, there were a couple of vacations interspersed. One was a motorcycle trip, and packing a laptop on a motorcycle is difficult (at least, if you want to keep it dry and unbroken). Another was spent in cabin in the woods, without Internet access. And while I suppose I could post from the iPhone, you don't want to read that (or more correctly, decipher it). But in reality, through much of July, I was at home and sitting at the computer for at least six hours out of the day (and standing on a ladder the rest of the time).

It turns out there's a paradox to having free time: when I was working for someone else, I would carve out an hour of each day during which I'd write. But now, I have the entire day to schedule as I wish, and I've been spending the time working on other projects. One of those projects is a wishlist service, and starting tomorrow, I'll be posting a series of articles about its design and development.