Thursday, August 27, 2009

Designing a Wishlist Service: Scalability

One of the repeated themes in Founders at Work is how startups were unprepared for scalability issues, particularly in the cases where the company experienced viral growth. In some companies, the founders worked around the clock adding capacity. Viral growth is by its nature unpredictable: you have no idea how fast — or how long — your company will keep growing, and going in you have no idea who is going to like the product. The rest of us, however, should have an intended audience for our product, and can make design decisions based on that audience.

There's a lot of conventional wisdom on designing for scalability, most of which boils down to “cache everything” and “don't maintain session state.” The trouble with conventional wisdom is that it rarely applies to the real world. Caching, for example, is useless if you're constantly changing the data, or if a particular data item is accessed infrequently. And a RESTful web service doesn't store session state by its very nature.

So what usage pattern should we expect from a wishlist service? I'm designing for three types of user:

Wishlist Owner
This person will add and remove items, and perhaps change their rankings. These interactions all happen on human timescales (ie, several seconds or minutes between changes, then long periods of inactivity), and should involve more writes that reads. For this user, caching is not going to help performance: the list will probably age out of the cache before it's read again, and would need to be constantly purged as items change.
Friends and Family
Wishlist owners may choose to share their wishlist, in the hopes that other people will do the buying. On the surface, it appears that caching might be a good idea: after all, there may be a lot of people accessing the service. But again, we're looking at human timescales: the list will be shared by passing a link, probably in an email. While you may send such an email to a hundred people, they're not going to open it at the same time, so there's no point in caching.
Collaborative Shoppers
One of the original triggers for this project was a collaborative shopping tool, where several people would share a list and make updates (typically comments), using a chat service to broadcast changes. This is one situation where caching would be useful, since there will be a group of people hitting the same data at the same time. They'll also be updating the data constantly, meaning the lifetime of a cached object would be very short.

However, I think that the solution for this scenario is to use caching in front of the application rather than as part of it: either Apache's mod_cache or some sort of caching filter in the application server. To make this work, we need to use URLs that identify the version of the list along with other key data — but this is the RESTful approach anyway.

By coming up with this picture, I can implement the service as a simple database front-end; I don't need to waste time writing explicit code for scalability. And if it goes viral … well, let's just say I don't expect that, particularly with an eCommerce server gating access.

No comments: