Showing posts with label process. Show all posts

Friday, January 27, 2017

Trusting the Internet: Picking Third-Party Libraries

Many applications today are like the human body:* a relatively small proportion of “in-house” code, leveraged by dozens if not hundreds of third-party libraries — everything from object-relational mappers to a single function that left-pads a string. And that leads to a conundrum: how do you pick the libraries that you include in your project? Or in other words, is it OK to download something from the Internet and make it a fundamental part of your business?

Sometimes, of course, you don't have a choice. If you use the JUnit testing framework, for example, you are going to get the Hamcrest library along with it (and maybe you'll feel some concern that the hamcrest.org domain is no longer registered). But what criteria did you use to pick JUnit?

I was recently faced with that very question, in looking for a library to parse and validate JSON Web Tokens in Java. There are several libraries to choose from; these are the criteria that I used to pick one, from most important to least.

Following the crowd
If 100,000 projects use a particular library without issue, chances are good that you can too. But how do you know how many projects use a library? For JavaScript projects, npm gives you numbers of downloads; ditto for Ruby projects and the gems they use. Java projects don't have it so easy: while Maven Central does keep statistics on downloads (they're available to package maintainers), that information isn't available to consumers (other than a listing of the top 10 downloads).
One interesting technique for following the crowd is looking in your local repository, to see if the package is already there as a dependency of another library. If you can create a dependency tree for your project, look at where the candidate lives in the tree: is it close to the root or deep in the weeds? Is it included by multiple other libraries or just one? These are all signals of how the rest of the world views the library.
Documentation
I believe that care in documentation is a good proxy for care in implementation. Things I want to see are complete JavaDoc and examples. For projects hosted on GitHub, I should be able to understand the library based solely on the README (and there's no reason that a non-GitHub project should omit a README, although I admit to being guilty in that regard).
Author Credibility
This can be difficult, especially where the “author” is a corporation (although large corporations tend to do their own vetting before letting projects out under the corporate name). In the case of a sole maintainer, I Google the person's name and see what comes up. I'd like to see web pages that demonstrate deep knowledge of the subject (especially for security-related libraries). Even better are slides from a conference, because that implies tha the author has at least some recognition in the community.
Issue Handling
Every library has issues. Does the maintainer respond to them in a reasonable timeframe? A large number of outstanding issues should raise a red flag, as should a maintainer that responds in a non-professional manner. You wouldn't accept that from a coworker (I hope), and by using a package you make the maintainer your coworker.

Once I have decided on a candidate library (or small number of candidates) I try it out for my use case. If it looks good, it becomes part of my application. One thing that I do not do is dig into the library source code.

The promise of open-source software is that you can download the sources and inspect them. The reality is that nobody every does that — and nobody could, because it would be more than a full-time job. So we choose as best we can, and hope that there isn't a dependency-of-a-dependency-of-a-dependency that's going to hurt us.

* The reference is to the amount of bacteria and other organisms that don't share your DNA but live on or in your body. You'll often find a 10:1 (bacteria:human) ratio quoted, but see this article for commentary on the history and validity of that ratio (tl;dr, it's more like 60:40).

Tuesday, September 13, 2016

Incremental Development is Like a Savings Account

In January of 1996 I opened a mutual fund account, investing in an S&P index fund. My initial deposit was $1,000, and I set up an automatic transfer of $200/month from my checking account. This month I looked at my statement and the account had crossed the $100,000 mark.

You can take that as an example of compound returns, and why you should establish a savings account early in life. But it reminded me of a conversation that I had with a project manager a few years ago, after our company had introduced Scrum.

His complaint was “But we could just decide to stop working on the project at any time!”

My response was a customer-value-oriented “Right, if we no longer provide incremental business value, then we should stop.”

But I now think that was the wrong answer, because in my experience most Agile projects don't stop. They continue to be enhanced for years because there's always an increment of value to be had, worth more than the cost of providing it. By comparison, “big push” projects do stop, because there's always another big project to consume the resources. So companies hop from one big push to another, pay for the team(s) learn the environment, and end up with something that's often less than useful to the client.

Returning to my mutual fund: if I had invested $1,000 and stopped, my investment would be worth approximately $5,000 today — the S&P has returned about 8% annually, even with the 37% downturn in 2008. But the reason that I'm at $100k today is because of the $200 added to the account every month for the past 20 years. Something I could “just stop” at any time.

Saturday, June 4, 2016

Target Fixation

Motorcyclists have a saying: you go where you look. If an animal runs out in front of you, or you there's a patch of sand in the middle of the corner, or a Corvette is coming the other way, your first response has to be to look elsewhere. If not, you'll almost certainly hit whatever it is that you didn't want to hit.

Another name for this phenomena is target fixation, and that name was driven home to me quite literally — and painfully — in a paintball game many years ago. I was slowly and carefully positioning myself to shoot one of the other players, when all of a sudden I felt a paintball hit the middle of my back. I was so fixated on my target that I stopped paying attention to what was around me.

I suspect that target fixation was an enormous help to our hunter-gatherer ancestors stalking their dinner. They would only get one chance to bring down their quarry, and didn't have the benefit of high-powered rifles and telescopic sights. To a modern human, surrounded by opportunities to fixate on the wrong thing, it's not so great.

Physical dangers are one thing, but we're also faced with intellectual dangers. If focus too closely on the scary thing that's right in front of you, you'll ignore all the pitfalls that lie just beyond. This is a particular concern for software developers, who may adopt and implement a particular design without taking the time to think of the ways that it can fail — or of alternative designs that are simpler and more robust.

For example, you might implement a web application that requires shared state, and become so fixated on transactional access to that state that you don't think about contention … until you start running at scale, and discover the delays that synchronization introduces. If you weren't fixated on concurrent access, you might have thought of better ways to share the state without the need for transactions.

So, how to avoid becoming fixated? In the physical world, where fixation has potentially deadly consequences, training programs focus on prevention via ritual. For motorcyclists, the ritual is “SEE”: search, evaluate, execute. For pilots, there are many rituals, but one that was burned into my brain is aviate, navigate, communicate.

For software development, I think that a preemptive “five whys” exercise is a useful way to avoid design fixation. This exercise is usually used after a problem occurs, to identify the root cause of the problem and potential solutions: you keep asking “why did this happen” until there are no more answers. Recast as a pre-emptive exercise, it is meant to challenge — and ultimately validate — the assumptions that underly your design.

Returning to the concurrency example, the first question might be “why do I want to prevent concurrent access?” One possible answer is ”this is inventory data, and we don't want two customers to buy the last item.” That could lead to several other questions, such as “do I need to use a database transaction?” and “do I need to make that guarantee at this point in the process?”

The chief danger in this exercise is “analysis paralysis,” which is itself a form of target fixation. To move forward, you must accept that you are making assumptions, and be comfortable that they're valid assumptions. If you fixate on the possibility that your assumptions are invalid, you'll never move.

You also need to recognize that, while target fixation is often dangerous, it can have a positive side: preventing you from paying attention to irrelevant details.

I had a real-world experience of this sort a few weeks ago, while riding my motorcycle on a twisting country road: I saw a pickup truck coming the other way and not keeping to his lane. With a closing speed in excess of 100 miles per hour there wasn't much time to make a decision, and not many good decisions to make. I could continue as I was going, assume that the driver would see me and be able to keep within his lane; if I was wrong in that assumption, my trip would be over. I could get on the brakes hard, now, but would come to a stop at the exact point where the pickup would leave his lane while exiting the corner.

My best option was to stop just past the apex of the corner, which would be where the pickup was most likely to be within his lane. I fixated on that spot, and let the muscle memory of 100,000+ miles balance the braking and turning forces necessary to get me there. I have no idea how close the truck came to hitting me; my riding partner said that it was an “oh shit” moment. But once I picked my destination, the pickup truck and everything around me simply disappeared.

Which leads me to think that there might be another name for the phenomena: “flow.”

Saturday, April 30, 2016

Taming Maven: Transitive Dependency Pitfalls

Like much of Maven, transitive dependencies are a huge benefit that brings with them the potential for pain. And while I titled this piece “Taming Maven,” the same issues apply to any build tool that uses the Maven dependency mechanism, including Gradle and Leiningen.

Let's start with definitions: direct dependencies are those listed in the <dependencies> section of your POM. Transitive dependencies are the dependencies needed to support those direct dependencies, recursively. You can display the entire dependency tree with mvn dependency:tree; here's the output for a simple Spring servlet:

[INFO] com.kdgregory.pathfinder:pathfinder-testdata-spring-dispatch-1:war:1.0-SNAPSHOT
[INFO] +- javax.servlet:servlet-api:jar:2.4:provided
[INFO] +- javax.servlet:jstl:jar:1.1.1:compile
[INFO] +- taglibs:standard:jar:1.1.1:compile
[INFO] +- org.springframework:spring-core:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-asm:jar:3.1.1.RELEASE:compile
[INFO] |  \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] +- org.springframework:spring-beans:jar:3.1.1.RELEASE:compile
[INFO] +- org.springframework:spring-context:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-aop:jar:3.1.1.RELEASE:compile
[INFO] |  |  \- aopalliance:aopalliance:jar:1.0:compile
[INFO] |  \- org.springframework:spring-expression:jar:3.1.1.RELEASE:compile
[INFO] +- org.springframework:spring-webmvc:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-context-support:jar:3.1.1.RELEASE:compile
[INFO] |  \- org.springframework:spring-web:jar:3.1.1.RELEASE:compile
[INFO] \- junit:junit:jar:4.10:test
[INFO]    \- org.hamcrest:hamcrest-core:jar:1.1:test

The direct dependencies of this project include servlet-api version 2.4 and :spring-core version 3.1.1.RELEASE. The latter has a dependency on spring-asm, which in turn has a dependency on commons-logging.

In a real-world application, the dependency tree may include hundreds of JARfiles with many levels of transitive dependencies. And it's not a simple tree, but a directed acyclic graph: many JARs will share the same dependencies — although possibly with differing versions.

So, how does this cause you pain?

The first (and easiest to resolve) pain is that you might end up with dependencies that you don't want. For example, commons-logging. I don't subscribe to the fear that commons-logging causes memory leaks, but I also use SLF4J, and don't want two logging facades in my application. Fortunately, it's (relatively) easy to exclude individual dependecial's, as I described in a previous “Taming Maven” post.

The second pain point, harder to resolve, is what, exactly, is the classpath?

A project's dependency tree is the project's classpath. Actually, “the” classpath is a bit misleading: there are separate classpaths for build, test, and runtime, depending on the <scope> specifications in the POM(s). Each plugin can define its own classpath, and some provide a goal that lets you see the classpath they use; mvn dependency:build-classpath will show you the classpath used to compile your code.

This tool lists dependencies in alphabetical order. But if you look at a generated WAR, they're in a different order (which seems to bear no relationship to how they're listed in the POM). If you're using a “shaded” JAR, you'll get a different order. Worse, since a shaded JAR flattens all classes into a single tree, you might end up with one JAR that overwrites classes from another (for example, SLF4J provides the jcl-over-slf4j artifact, which contains re-implemented classes from commons-logging).

Compounding classpath ordering, there is the possibility of version conflicts. This isn't an issue for the simple example above, but for real-world applications that have deep dependency trees, there are bound to be cases where dependencies-of-dependencies have different versions. For example, the Jenkins CI server has four different versions of commons-collections in its dependency tree, ranging from 2.1 to 3.2.1 — along with 20 other version conflicts.

Maven has rules for resolving such conflicts. The only one that matters is that direct dependencies take precedence over transitive. Yes, there are other rules regarding depth of transitive dependencies and ordering, but those are only valid to discover why you're getting the wrong version; they won't help you fix the problem.

The only sure fix is to lock down the version, either via a direct dependency, or a dependency-management section. This, however, carries its own risk: if one of your transitive dependencies requires a newer version than the one you've chosen, you'll have to update your POM. And, let's be honest, the whole point of transitive dependencies was to keep you from explicitly tracking every dependency that your app needs, so this solution is decidedly sub-optimal.

A final problem — and the one that I consider the most insidious — is directly relying on a transitive dependency.

As an example, I'm going to use the excellent XML manipulation library known as Practical XML. This library makes use of the equally excellent utility library KDGCommons. Having discovered the former, you might also start using the latter — deciding, for example, that its implementation of parallel map is far superior to others.

However, if you never updated your POM with a direct reference to KDGCommons, then when the author of PracticalXML decides that he can use functions from Jakarta commons-lang rather than KDGCommons, you've got a problem. Specifically, your build breaks, because the transitive depenedency has disappeared.

You might think that this is a uncommon situation, but it was actually what prompted this post: a colleague changed one of his application's direct dependencies, and his build started failing. After comparing dependencies between the old and new versions we discovered a transitive depenency that disappeared. Adding it back as a direct dependency fixed the build.

To wrap up, here are the important take-aways:

Pay attention to transitive dependency versions: whenever you change your direct dependencies, you should run mvn dependency:tree to see what's changed with your transitives. Pay particular attention to transitives that are omitted due to version conflicts.
If your code calls it, it should be a direct dependency. Plugging another of my creations, the PomUtil dependency tool can help you discover those.

Saturday, April 23, 2016

Technical Debt

Like many software developers in the 21st century, I use the term “technical debt” in a negative way: it's the ever-accumulating cruft in your system that stands in the way of adding new features. As technical debt increases, the work takes ever longer, until you reach a point where forward progress ceases.

This view of technical debt equates it to a credit card: unless you pay your balance in full each month, you're charged interest. If you only make the minimum payment, that interest accrues and it will take you years to pay off the card. If you make the minimum payment and keep charging more, you may never get out of debt. Eventually, after maxing out several cards, you'll have to declare bankruptcy.

But that's a very puritanical view of debt, and it's not a view shared by everyone.

For a person with a business-school background, debt is a tool: if you can float a bond at 5% to build a factory that gives you a 10% boost in income, then you should do that (usually — there are other factors to consider, such as maintenance and depreciation). More important, you're not going pay that bond off before it's due; doing so would negate the reasons that you issued it in the first place.

Which means that the term “technical debt” probably doesn't have the same connotations to your business users as it does to you. In fact, using that term may be dangerous to the long-term prospects of your project. If you say “we can release early but we'll add a lot of technical debt to do so,” that's a no-brainer decision: of course you'll take on the debt.

I think a better term is total cost of ownership (TCO): the amount you pay to implement features now, plus the amount you will pay to add new features in the future. For example, “we can release this version early, but we'll add three months to the schedule for the next version.”

Which may still mean that you cut corners to release early, and probably won't stave off demands to release the next version early as well. But at least you'll be speaking the same language.

Sunday, March 1, 2015

Developing in the Clouds

Deploying to the cloud — whether to a platform-as-a-service provider such as Heroku, or an infrastructure-as-a-service provider such as Amazon EC2 — is commonplace, particularly for startups that don't want to invest in infrastructure, or established companies that need resources to handle operational spikes. But not much has been written about using a cloud-based host as a development platform. I just wrapped up a six-month project where my primary development machine was an Amazon EC2 instance, and here are a few of my thoughts.

I'm going to start with the negatives:

Latency
I live in Philadelphia and was working on an Amazon EC2 instance in Ashburn Virgina, a distance of approximately 150 miles — at the speed of light, roughly a millisecond. However, there are multiple network hops between me and the server, all of which add up to a round-trip time (via ping) of roughly 40 milliseconds. If you confine yourself to text, that's nearly unnoticeable. If you run the X Window System, it's unbearable. If your workflow is GUI-intensive, cloud-based development might not be a good choice (although I consider VNC quite acceptable when using an IDE for development).
Capability
My desktop runs an Ivy Bridge Core i7 with 32 Gb of RAM. The t2.medium instance that I used for remote development has baseline performance of roughly 40% of an undisclosed Xeon and only 4 Gb of RAM. As it turns out, that's sufficient for many development tasks, especially with a rapid-turnaround platform such as Node.JS. If you have big compiles, you can always fire up a c4.8xlarge with a Haswell Xeon, 60 Gb of RAM, and disk throughput that's far better than your desktop SSD.
Cost
Mind you, that c4.8xlarge will cost you: as of this date, $1.68 per hour or fraction thereof. On another project, a colleague fired up a cluster of these instances and forgot to shut them down when he left the company. A month later the IT department gave me a call to ask if we really needed them, because they were costing us $5,000 a month. By comparison, the t2.medium instance costs $0.052 per hour, or $456 per year. More than a decent developer desktop on a three-year depreciation schedule, but not that bad in the larger scheme.
Security
This is the big one: if you're going to run machines in the cloud, you need to have at least a baseline knowledge of Internet security (or hire someone who does). Put simply, you will be attacked. To apply some numbers to that statement, I started an EC2 instance that exposed SSH and HTTP, and left it running for 24 hours. The first attempt to break into SSH happened within a half hour; there were 39 attempts over the course of the test. Yandex started exploring the site within 12 hours, followed by other web scrapers.*
Basic security rules will get you a long way: don't use dictionary passwords for any exposed service (and for SSH, don't use passwords at all), and don't expose any unprotected services to the outside world. Use a firewall that checks origin IP. If you're running on AWS, this feature is built into security groups. If you need to share access to your instance, or access it from locations that you don't know in advance, consider a VPN.
This is also a case where I think security by obscurity is useful — at least as a first line of defense. Most scannerbots and web crawlers look at well-known ports: 22 for SSH; 80, 8000, and 8080 for HTTP. Exposing your prototype website on port 21498 isn't going to stop a dedicated attacker (and there are bulk port scanners out there), but it will prevent your site's content from showing up in a search index before you're ready.

And now, the positives:

Availability
The ability to access a cloud-based host from anywhere, at any time, gives you an enormous amount of flexibility in how you do work. There's no need to lug a laptop home every night, and if the roads are snow-covered and filled with stopped traffic, you can easily work from home. With tools like screen or VNC, you can have a session that's always set up just how you want it, and which can run programs while you're not connected. Plus, it's easy to collaborate: unlike a personal computer, a cloud instance can be shared by multiple users.
Consistency
I don't know about you, but after a few years my computers all accumulate a significant amount of cruft: libraries or entire applications that I installed for a particular project and no longer need, along with various versions of standard tools, some of which are expected by the OS itself. Multiply that cruft by the number of members on your team, and add a fudge factor for different shell configuration files. None of which matches your production (or even test) environment. It's a mess.
To me, this is the biggest benefit of developing in the cloud: you can be certain that all machines are configured alike — or at least start out that way. Tools such as Chef and Puppet will take you from a base image to a fully configured server in one step. With Amazon, after you've configured the server once, you can create a private AMI and stamp out as many instances as you want.
Disposability
The flip side of starting servers quickly is disposing of them when no longer needed. There's no reason to patch or update your machine; that just accumulates cruft. This mantra has long been used by operations teams: I was out riding with a friend last fall when his beeper went off; we pulled over, he discovered that one of his production servers was having trouble, shut it down, and started a replacement. Analysis could wait for later; there was no need to try to make an emergency patch.

Finally, lessons learned:

Experience Helps — a Lot
I've been using AWS since 2009, and have been responsible for administering my personal Linux machine since well before that. However, the limits of my knowledge became apparent when the company hired Eric, a “cloud architect” who lived up to the name. He quickly had us running in an isolated virtual private cloud (VPC), with separate VPCs for our test, integration, and production environments, OpenVPN to secure access, LDAP to hold credentials, and automatic deployments from our CI server. If you can find such a person, hire him or her; it will save a lot of time.
Disposability Changes How You Work
I got a hint of this several years ago, when using AWS to test a distributed application: we would prepare a test, start up a dozen servers, run the test, then shut them down. It allowed a form of testing that was quite simply impossible just a few years earlier; no company that I've worked for had a spare closet full of machines, and even if they did, configuration would require hours.
As developers, we set up our machines just the way we want them; it takes a long time, and no two developers have the same configuration. But if you have the base development image preconfigured, firing up a new machine becomes a process of copying your personal configuration files and checking out your workspace. You learn to let go of “my machine.”

* I've written elsewhere about the questionable practices of web crawlers. If you have a site that's not ready for primetime, don't expose it to the Internet on a well-known port.

Monday, June 2, 2014

Is that SSD Really Helping Your Build Times?

Update: I ran similar tests with full-disk encryption.

As developers, we want the biggest bad-ass machine that we can get, because waiting for the computer is so last century. And part of a bad-ass machine is having a solid-state drive, with sub-millisecond latency. Spinning platters covered in rust are not just last-century, they're reminiscent of the industrial age. But do we really benefit from an SSD?

This post emerged from a conversation with a co-worker: he was surprised that I encrypted my home directory, because of the penalty it caused to disk performance. My response was that I expected most of my files to be living in RAM, unencrypted, in the disk buffer. That led to a discussion about whether an SSD provided any significant benefit, given enough RAM to keep your workspace in the buffer cache. Turns out I was wrong about unencrypted data in the cache, but not about the SSD.

I was confident about the latter because a year ago, when I built my then-seriously-badass home computer (32Gb RAM — because I could), I ran some performance comparisons against my then-seriously-pathetic vintage 2002 machine. The new machine blew away the old, but much of the performance gain seemed to come from CPU-related items: faster clock speed, faster memory, huge L1 and L2 caches, and so on. Once CPU time was deducted, the difference between spinning rust and SSD wasn't that big.

I started to write a post at that time, but went down a rathole of trying to create a C program that could highlight the difference in L1/L2 cache. Then the old machine suffered an “accident,” and that was the end of the experiments.

Now, however, I have a far simpler task: quantify the difference that an SSD makes to a developer's workload. Which can be rephrased as “will buying an SSD speed up my compile times?” This is particularly important to me right now, because I'm on a project where single-module compiles take around a minute, and full builds are over 30.

Here's the experimental protocol:

Hardware:

Thinkpad W520: 4-core Intel Core i7-2860QM CPU @ 2.50GHz, 800MHz FSB. A bad-ass laptop when I got it (I just wish it wasn't so heavy).
8 GB RAM, 8 MB L2 cache
Intel “320 Series” SSD, 160 Gb, formatted as ext4. This is not a terribly fast drive, but with an average access time of 0.2 ms, and an average read rate of 270 MB/sec (as measured by the Gnome disk utility), it blows away anything with a platter.
Western Digital WD2500BMVU, 250 GB, 5400 RPM, formatted as ext4, accessed via USB 2.0. This is a spare backup drive; I don't think I own anything slower unless I were to reformat an old Apple SCSI drive and run it over a USB-SCSI connector (and yes, I have both). Average access time: 17.0 ms; average read rate: 35 MB/sec.
Xubuntu 12.04, 3.2.0-63-generic #95-Ubuntu SMP.

Workload:

Spring Framework 3.2.0.RELEASE. A large Java project with lots of dependencies, this should be the most disk-intensive of the three sample workloads. The build script is Gradle, which downloads and caches all dependencies.*
Scala 2.11.1. I'm currently working on a Scala project, and the Scala compiler itself seemed like a good sample. The main difference between Scala and Java (from a workload perspective) is that the Scala compiler does a lot more CPU-intensive work; in the office I can tell who's compiling because their CPU fan sounds like a jet engine spooling up. The build script is Ant, using Ivy to download and cache dependencies.**
GNU C Compiler 4.8.3. Added because not everyone uses the JVM. I didn't look closely at the makefile, but I'll assume that it has optimization turned up. Disk operations should be confined to reading source files, and repeated reads of header files.

Test conditions:

Unencrypted SSD
Encrypted SSD: same drive, default Ubuntu home directory encryption
USB hard drive

General configuration:

Each test is conducted as a distinct user, with its own home directory, to ensure that there aren't unexpected cross-filesystem accesses.
Each build is run once to configure (gcc) and/or download dependencies.
Timed builds are run from normal desktop environment, but without any other user programs (eg: browser) active.
Timed builds run with network (wifi and wired) disconnected.
The Spring and Scala times are an average of three runs. The gcc time is from a single run (I didn't have the patience to do repeated multi-hour builds, just to improve accuracy by a few seconds).

Per-test sequence:

Clean build directory (depends on build tool).
Sync any dirty blocks to disk (sync).
SSD TRIM (fstrim -v /)
Clear buffer cache (echo 3 > /proc/sys/vm/drop_caches)
Execute build, using time.

And now, the results. Each entry in the table contains the output from the Unix time command, formatted real / user / sys. I've converted all times to seconds, and rounded to the nearest second. The only number that really matters is the first, “real”: it's the time that you have to wait until the build is done. “User” is user-mode CPU time; it's primarily of interest as a measure of how parallel your build is (note that the JVM-based builds are parallel, the gcc build isn't). “Sys” is kernel-mode CPU time; it's included mostly for completeness, but notice the difference between encrypted and non-encrypted builds.

	Spring Framework	Scala	GCC
Unencrypted SSD	`273 / 527 / 10`	`471 / 1039 / 13`	`6355 / 5608 / 311`
Encrypted SSD	`303 / 534 / 38`	`491 / 1039 / 29`	`6558 / 5682 / 400`
USB Hard Drive	`304 / 525 / 11`	`477 / 1035 / 14`	`6462 / 5612 / 311`
Encryption Penalty	`11 %`	`4 %`	`3 %`
Spinning Rust Penalty	`11 %`	`1 %`	`2 %`

Do the numbers surprise you? I have to admit, they surprised me: I didn't realize that the penalty for encryption was quite so high. I haven't investigated, but it appears that ecryptfs, as a FUSE filesystem, does not maintain decrypted block buffers. Instead, the buffered data is encrypted and has to be decrypted on access. This explains the significantly higher sys numbers. Of course, losing my laptop with unencrypted client data has it's own penalty, so I'm willing to pay the encryption tax.

As for the difference between the SSD and hard drive: if you look at your drive indicator light while compiling, you'll see that it really doesn't flash much. Most of a compiler's work is manipulating data in-memory, not reading and writing. So the benefit that you'll get from those sub-millisecond access times is just noise.

On the other hand, if you're doing data analysis with large datasets, I expect the numbers would look very different. I would have killed for an SSD 3-4 years ago, when I was working with files that were tens of gigabytes in length (and using a 32 GB Java heap to process them).

Finally, to borrow an adage from drag racers: there's no subtitute for RAM. With 8 GB, my machine can spare a lot of memory for the buffer cache: free indicated 750 Mb after the Scala build, and several gigabytes after the gcc build. Each block in the cache is a block that doesn't have to be read from the disk, and developers tend to hit the same blocks over and over again: source code, the compiler executable, and libraries. If you have enough RAM, you could conceivably load your entire development environment with the first build Monday morning, and not have to reload it all week.†

At least, that's what I told myself to justify 32Gb in my home computer.

* I picked this particular version tag because it mostly builds: it fails while building spring-context, due to a missing dependency. However, it spends enough time up to that point that I consider it a reasonable example of a “large Java app.” I also tried building the latest 3.X tag, but it fails right away due to a too-long classname. That may be due to the version of Groovy that I have installed, but this experience has shaken my faith in the Spring framework as a whole.

** Scala has issues with long classnames as well, which means that the build will crash if you run it as-is on an encrypted filesystem (because encryption makes filenames longer). Fortunately, there's an option to tell the compiler to use shorter names: ant -Dscalac.args='-Xmax-classfile-name 140'

† What if you don't have a lot of memory to spare? I also tried running the Spring build on my anemic-when-I-bought-it netbook: an Intel Atom N450 @ 1.66 GZ with 1GB of RAM and a 512KB L2 cache. The stock hard drive is a Fujitsu MJA2250BH: 250 GB, 5400 RPM, an average read rate of 72 MB/sec, and an average access time of 18 ms. I also have a Samsung 840 SSD that I bought when I realized just how anemic this machine was, thinking that, if nothing else, it would act as a fast swap device. However, it doesn't help much: with the stock hard drive, Spring builds in 44 minutes; with the SSD, 39. An 11% improvement, but damn! If you look at the specs, that netbook is a more powerful machine than a Cray I. But it's completely unusable as a modern development platform.

Monday, February 3, 2014

Coder vs Engineer

Stack Overflow is a fabulous resource for programmers. When I have programming questions, the first page of Google results is filled with links to its pages, and they usually have the answers I need. So why do I often feel depressed after browsing its questions?

The answer came to me this weekend: it's a hangout for coders, not engineers.

The question that prompted this revelation was yet another request for help with premature optimization. The program in question was tracking lap times for race cars, and the OP (original poster, for those not familiar with the acronym) was worried that he (she?) was extracting the list of cars and sorting it after every update. He saw this as a performance and garbage-collection hit, that would happen “thousands of times a second.”

That last line raised a red flag for me: I'm not a huge race fan, but but I can't imagine why you would expect to update lap times so frequently. The Daytona 500, for example, has approximately 40 cars, each of which take approximately a minute per lap. Even if they draft, you have a maximum of 40 updates per second, for a rather small set of objects.

To me, this is one of the key differences between a coder and an engineer: not attempting to bound the problem. Those interview questions about counting gas stations in Manhattan are all about this. You don't have to be exact, but if you can't set a bound to a problem, you can't find an effective solution. Sure, updating and sorting thousands of cars, thousands of times a second, that might have performance issues. But that's not how real-world races work.

Another difference is that, having failed to bound the problem (indeed, even to identify whether there is a problem), the coder immediately jumps to writing code. And that was the case for the people who answered this particular question. Creating a variety of solutions that all solved some interpretation of the OP's problem.

And I think that's what really bothers me: that coders will interpret a question in whatever way makes their coding easiest. This was driven home by a recent DZone puzzle. The question was how to remove duplicates from a linked list, “without using a buffer.” It's a rather poorly-worded question: what constitutes a buffer?

There were a few people who raised that question, but by far the majority started writing code. And some of the implementations were quite inventive in their interpretation of the question. The very first response limited the input list to integers, and used a bitset to track duplicates (at the worst, that would consume nearly 300Mb of RAM — a “buffer” seems modest by comparison). Another respondent seemed to believe that a Java ArrayList satisfied the “linked list” criteria.

Enough with the rant. Bottom line is that this industry needs to replace coders by engineers: people who take the time to understand the problems that they're tasked to solve. Before writing code.

Friday, October 25, 2013

Deprecation and Technical Debt

API designers deprecate classes and methods as a signal that they shouldn't be used. Usually because there's a better way to do the same thing, but occasionally (as in the case of Java's Date constructors) because the existing methods have unintended consequences. Deprecation is the first step in controlled breakage of backwards compatibility, a warning for users that they can't rely on the functionality always being present.

Of course, in the real world, deprecated functionality rarely gets removed. The Date constructors have been with us since Java 1.0.x and show no sign of going away. Indeed, of all the major frameworks that I've used, the only one that takes deprecation seriously is Spring — and even then, only minor things seem to get removed; you can still compile a program written for Spring 1.x with only a few changes.

However, leaving deprecated code in your codebase is a form of technical debt. You run the risk that the functionality will one day be removed, and that you'll have to scramble to replace it. But more important, if your code contains a mix of deprecated and non-deprecated features, you have to expend extra effort to maintain it.

The worst case that I've seen was a company that built websites for clients. One of their features was a content management system: a tool that allowed the client to create their own content, inserted into the page when various conditions were met. Standard stuff, but this company actually had three incompatible content management systems, all home-grown. Each was developed to replace the previous.

The reason that all three systems remained in active use was that there was a cost to convert to the newer framework, and neither the company nor their clients were willing to pay that cost. In effect, they chose to leave deprecated code in their codebase.

Since the frameworks were all built in-house, there was no reason to worry about them going away unexpectedly. But what did go away were the developers who had built the frameworks, and with them the knowledge of how the frameworks worked. Although the older frameworks didn't have a lot of bugs, when one did appear it was a crisis situation, with a mad search to find someone with enough knowledge to track it down and fix it. Even without bugs, the lack of knowledge meant that every enhancement to the client site took extra time, as each new developer learned enough to avoid breaking the CMS.

This situation continued for as long as I worked at that company; for all I know, they've added a fourth or fifth system to the mix. Although the old systems imposed an unnecessary cost on each project, that incremental cost was always lower than the cost to upgrade. But over time, the costs added up, and the client paid much more than an outright conversion (which the company didn't mind, as that increased revenue).

Worse, developers did all they could to avoid contact with the older systems, or to have their names on any checkins associated with it. Getting pulled onto a CMS problem meant that your own project schedules would slip, and you'd have to listen to upset project managers waste more of your time. Much easier to let the other team sink or swim on their own.

I'm not saying that ignoring deprecations will create disfunction and discord within your development team. But do you want to risk it?

Thursday, August 2, 2012

Taming Maven: A Local Repository Server

While Maven's automatic dependency retrieval is a great feature, it does have limitations. And one of the biggest of those limitations is Maven's ability to access local projects. Or, really, any projects that aren't found in the Maven Central repository.

Returning to the archetypal IT department of my previous posts, it's really painful if, before you can start working on your own project, you first have to check-out and build multiple dependent projects. Worse is if those dependencies are under development, and you have to rebuild on a regular basis.

The first part of eliminating that pain is to run a local repository server. This server is a place to deploy your local builds, including any third-party software or local patches you've made to open source software. It can also act as a proxy for Maven Central and other external repositories, protecting you from Internet outages and keeping your POMs free of <repositories> entries.

You can create a Maven repository server using a bare-bones Apache web-server: all you need to do is make sure that its document root follows the repository format. However, there are better options: Nexus and Artifactory are both purpose-built servers for managing Maven artifacts, and both come in open-source variants that you can run for free. If you don't have a local machine, or don't want the hassle of administering it, Artifactory provides cloud hosting of your repository (for a fee). Sonatype doesn't go quite that far, instead providing a pre-built EC2 image (hopefully updated since that post).

Once you've got the repository server running, you need to configure Maven to access it. The simplest approach is to add your local server as a mirror for Maven Central, as described here. Note that you can not simply add a <repositories> entry to your parent POM, as you need to deploy that POM to the repository server.

Now you face the question of how to deploy your builds. Both Nexus and Artifactory give you a web interface to manually upload artifacts, but it's far easier to use the Maven deploy goal to deploy directly from your build (using an HTTP connection to the server). Of course, that raises the issue of credentials: do you give each developer his/her own credentials (which are stored in $HOME/.m2/settings.xml), or use a single set of credentials for all?

I'm in favor of the latter: use one set of credentials, stored either in each user's personal settings file, or in the global settings file. While that may make some people cringe, the security risk is non-existent: the repository server is write-only, and it will control where you write. As long as you don't pass out the actual admin login, or use SCP to deploy, the worst a disgruntled ex-employee can do is upload new builds.

And even that that minor risk can be eliminated if your developers never have to type “deploy” — and they'd be happier too. Instead, set up a continuous integration server that examines your source repository for changes and automatically builds and deploys the changed projects. At least for snapshot builds, this ensures that all of your developers will be using the latest codebase, without any manual intervention whatsoever.

Wednesday, August 1, 2012

Taming Maven: Dependency Management

Once you have a parent POM, you can add a <dependencyManagement> section. In my view, it's often more trouble than its worth (in fact, this blog series started out as a single post suggesting properties were usually a better choice). In the worst case, a dependency management section can prevent your child builds from seeing new dependencies. There are, however, some cases where it is useful to prevent Maven from using unwanted transitive dependencies.

For those who haven't seen <dependencyManagement> in use, here's an abbreviated example:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>${springframework.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-webmvc</artifactId>
            <version>${springframework.version}</version>
        </dependency>
    </dependencies>
</dependencyManagement>

With the dependency version specified in the parent's dependency management section, it can be omitted in the child's <dependencies>:

<dependencies>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-core</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-webmvc</artifactId>
    </dependency>
</dependencies>

If that was all there was to it, there would be very little reason to use a dependency management section, versus defining version properties in the parent — it might actually be a net increase in the size of your POMs. But dependency management goes a bit deeper: it will override the transitive dependencies associated with your direct dependencies. This bears further examination.

If you use the Spring framework, you know that it's broken into lots of pieces. You don't need to specify all of these as direct dependencies, because each component brings along transitive dependencies (I haven't verified this, but I think spring-mvc the only direct dependency you need for a Spring web-app).

Now consider the case where the parent POM has a dependency management section that lists all of the Spring components, and gives them version 3.0.4.RELEASE, while a child uses version 3.1.1.RELEASE, and just specifies spring-webmvc as a direct dependency. When Maven builds the child and retrieves transitive dependencies, it will ignore the 3.1.1.RELEASE implied by the direct dependency, and instead load the 3.0.4.RELEASE versions specified by the parent.

This is rarely a Good Thing. Sometimes it won't cause an actual bug: if you are using features of the new version that haven't changed since the old version, you have nothing to worry about. But more often, you'll get a NoSuchMethodError thrown at runtime. Or worse, the method is present but does something unexpected. These sorts of bugs can be incredibly painful to track down.

Version properties, of course, go a long way toward keeping these errors from occurring. But some projects will need to specify their own version properties, often because they're trying out some new functionality.

There is, however, one case where a dependency management section is useful: excluding transitive dependencies. Again using Spring as an example: it will use either commons-logging or SLF4J for its internal logging; at runtime, it figures out which is available. However, as of this writing (version 3.1.2.RELEASE), spring-core has a non-optional transitive dependency on commons-logging. Which means that your program will also have a transitive dependency on commons-logging — and if your program is a web-app, you'll find commons-logging in its deployed WAR whether you want it or not.

Perhaps some day the Spring developers will change the scope of this dependency to provided. Until then, if you don't want commons-logging you need to manually break the transitive dependency with an exclusion:

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-core</artifactId>
    <version>${springframework.version}</version>
    <exclusions>
        <exclusion>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Without a dependency management section in the parent POM, you would have to repeat that exclusion in every project POM. Miss just one, and the transitive dependency appears. Move the exclusion into the parent's dependency management section, and it applies to all children. Of course, this locks down the version number; any child projects that need a different version must specify a direct dependency on spring-core.

Bottom line: don't use a <dependencyManagement> section unless you absolutely have to. And even then, keep it as small as you possibly can.

Tuesday, July 31, 2012

Taming Maven: Parent POMs

When faced with a development environment that has dozens or hundreds of distinct projects, version properties are only a first step. Even if you only have to look at one place in each POM, it's a headache to update versions. As a solution, you can specify all of your common dependencies and configuration in a “parent” POM, which is then referenced by each project.

Before continuing, I want to clear up a misconception: parent POMs are not the same as multi-module projects, even though they're described together in the Maven documentation. True, the two are often seen together: a multi-module project almost always uses a parent POM to bring order to its modules. But the parent need not be tied to the modules; an example is the Sonatype parent POM, which is used by every open-source project that deploys to Maven Central via the Sonatype repository.

A parent POM looks like a normal POM, but specifies a packaging type of “pom”

<groupId>org.sonatype.oss</groupId>
<artifactId>oss-parent</artifactId>
<version>7</version>
<packaging>pom</packaging>

The children of this POM then reference it via a <parent> element:

<parent>
    <groupId>org.sonatype.oss</groupId>
    <artifactId>oss-parent</artifactId>
    <version>7</version>
</parent>

<groupId>net.sf.kdgcommons</groupId>
<artifactId>kdgcommons</artifactId>
<version>1.0.7-SNAPSHOT</version>

So what goes into the parent POM? Version properties, of course; one of the main reasons for using a parent POM is to ensure that all projects use the same set of dependencies. Also common plugin configuration, such as the compiler, test runner, and any reporting plugins. Finally, any common environment configuration, such as repositories and deployment configuration.

What shouldn't go in the parent POM is an actual <dependencies> section, because that will cause all of your projects to have the same set of dependencies, whether they need them or not. Nor should you add plugins that only run for one or a few projects (although by all means specify the plugin versions). And finally, if your projects use an <scm> section, it needs to go in the individual project POMs — I learned the hard way that Maven won't substitute project-specific values into a section defined by the parent.

The biggest complaint that I've heard about parent POMs is “if we change a dependency, then we have to update all the projects that use that parent!” That's true: the parent is a released artifact, just like the projects themselves; a child specifies a particular version of its parent, and is not automagically updated when the parent changes (unless you use snapshot versions for the parents).

My answer to this complaint is “either it matters or it doesn't, and either way the parent helps you.” There are times when changes don't matter: for example, if you move to a new library version that's backwards compatible. In that case, projects that use the new parent get the new version, as do any projects that link with them, via transitive dependencies. Projects that don't need the new functionality don't need to be updated. Over time, you can migrate these projects to the new POM as you make changes to them.

On the other hand, sometimes the change matters: for example you've modified your database schema, and need to update all projects that use the affected business objects. In this case, the parent again makes your life easier: once you update the dependency property in the parent, it's a simple matter of grepping for that property to find children that need to be updated and re-released.

Monday, July 30, 2012

Taming Maven: Version Properties

Getting started with Maven is easy, and once you use its dependency management feature, you'll wonder why you waited so long. For simple web-apps or single-module projects, it Just Works.

However, most software developers aren't working on simple, one-module projects. We work in organizations that manage many projects, often depending on one-another. And in this situation, the basic Maven project breaks down: you find that every project has a differing set of dependencies, some of which are incompatible. This is the first in a series of postings about taming a multi-project development environment.

To start, replace all of your hardcoded dependency versions with properties.

Projects accumulate dependencies over time: you might start out with a few of the core Spring packages, then add a few of the Apache Commons projects, then a few more Spring projects, then some libraries that another part of your organization maintains. Pretty soon you'll have dozens of dependencies, completely unordered. Just finding a dependency in the mess becomes difficult, even if you have a tool like m2eclipse. And it becomes very easy to have two related dependencies — or even duplicate dependencies — with different versions. Maven can resolve most of these problems automagically, but when it fails, you're in for a long and painful diagnosis session.

But, if you use properties for your dependencies, and adopt a consistent naming strategy for those properties, you may not be able to find your dependency references, but at least the versions will be defined in one place. Start by adding a <properties> section to your POM; I generally place it near the top of the POM, before the <build> and <dependencies> sections (both of these tend to be long).

<properties>
    <org.springframework.version>3.1.1.RELEASE</sorg.pringframework.version>
    <!-- and so on, for all of your dependencies -->
</properties>

Each property is its own element, and the element name is the property name. You can name your properties anything you want (as long as it's a legal XML element name), but for version properties I think that GROUPID.version makes the most sense. Or use GROUPID.ARTIFACTID.version if there are different artifacts for the same group that don't have a common version (for example, Spring Core and Spring Security).

Next, update the dependency to use that property, rather than a hardcoded version number.

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-webmvc</artifactId>
    <version>${org.springframework.version}</version>
</dependency>

Once all of your POMs use version properties, you can start to organize dependencies across projects. This can be as simple as running grep to find all projects that use a particular property. But Maven gives you a better solution, which will be the topic of tomorrow's post.

This series of posts was prompted by several recent projects where I worked with development organizations that had large project bases built on Maven. I was originally planning to show some of the Vim macros that I used to clean up POMs, but decided instead to start work on a tool to clean up POMs.

Saturday, October 1, 2011

The Role of Automated Tests

Automated testing is moving into the mainstream, adopted as a “best practice” by more companies each year. But why? Here are my reasons, originally intended as bullet points in a presentation on how to write tests.

Tests verify that the program behaves as expected

Let's get one thing out of the way up front: tests can find bugs, but they can't prove that no bugs exist. Or, as my friend Drew puts it: “tests can only show that your incorrect assumptions are internally consistent.”

However, as you increase test coverage, using well-designed tests, you gain confidence that the program will do what you want it to. In other words, that there aren't any obvious bugs. And unless you're writing code for the space shuttle, that's probably good enough.

Tests verify that the program continues to behave as expected when changed

The major portion of a program's development happens after it's released (80% is commonly quoted, but I couldn't find an authoritative reference). The bugs that got through testing will be found by end-users. Requirements will change, ranging from a simple UI facelift, through the addition of new business rules, to the deep structural changes needed to support increased load.

And when you change code, you risk breaking it. Usually in a place that you didn't think would be affected. Even in well-written code, there may be hidden side-effects. A test suite can protect you from the unintended consequences of change, provided again that it has complete coverage and well-designed tests. In my opinion, this is how automated tests provide the most value to the organization.

Of course, a test suite can also become part of a change. If your business rules change, then your tests have to change as well. This should be less of an issue at the level of “unit” tests, but it still happens. Unfortunately, many organizations consider such changes as an undesired cost. Instead, they should view them as a warning that the code may contain hidden dependencies on the old behavior, and budget extra time for release.

Tests serve as documentation

The idea of test-as-specification has long been part of Agile orthodoxy. Although, in practice, it can take a lot of work to make that happen with mainstream testing tools. I know that I've written more than my share of test methods with names like testOperation(). But if you have the discipline, a method named testFailureWhenArgumentsWouldCauseIntegerOverflow() is far more useful.

Tests give you a chance to think about your design

To me, this has always been the main benefit of testing: “if it's hard to test, it will be hard to use.” Of course, you can take this to an extreme: I have actually been asked by a traditional QA developer to store an application's internal data in comma-delimited format so that they could validate it (in that case, the binary format already took over 1GB, and was heavily optimized for access speed). While actively harming your design in the name of testability is foolish, it's not the common case.

More realistic is some code that I recently refactored: a single class that created a listener for external data, applied some business logic to the messages received, and sent messages based on that logic. As written, this code was impossible to test without instantiating the entire messaging framework. After refactoring, the business logic was in its own class, with separate listener and sender objects that could be mocked for testing. And that core business logic could now be tested in the form of a specification, with test names like testIgnoresThirdAndSubsequentDuplicateMessages().

Wednesday, August 31, 2011

Using a Local Repository Server with Gradle

I've been doing a little work with Gradle recently. And one of the things that I find “less than optimal” is that the build script holds far too much knowledge about its environment, which means that you have to jump through some hoops to make those scripts portable. Not a huge problem for in-house development, but if you're making an open-source library, you don't want everyone else to reconfigure their world to match yours.

One particular problem is how to find your dependencies. A typical build script has a repositories section that lists all the places to look. Here's a simple example, that looks first in the local Maven repository, followed by the Maven Central:

repositories {
    mavenLocal()
    mavenCentral()
}

This is a portable build script — although I have no idea how dependencies might find their way to the local Maven repository, since Gradle uses its own dependency cache. A better build script might want to use a local repository server rather than constantly hitting Maven Central:

repositories {
    mavenRepo urls: 'http://intranet.example.com/repository'
}

That works, but now you can't share the build script with anybody else, unless they edit the script to use their own repository server (assuming they have one), and remember not to check in their changes. The solution that I came up with is to store the repository URL in $HOME/.gradle/gradle.properties, which is loaded for every build.

internalRepositoryUrl: http://repo.traffic.com:8081/nexus/content/groups/public/

Then, the build script is configured to add the local server only if the property is defined:

repositories {
    mavenLocal()

    if (project.hasProperty('internalRepositoryUrl') )
        mavenRepo urls: project.internalRepositoryUrl
    else
        mavenCentral()
}

It's portable, but it's ugly. When searching for solutions, I saw a couple of postings indicating that gradle.properties will eventually be allowed to contain expressions as well as properties. That day can't come soon enough.

Wednesday, August 10, 2011

Defining Done

It seems that a lot of Agile teams have a problem defining when a task or story is “done.” I've seen that on the teams that I've worked with, occasionally leading to heated argument at the end of a sprint.

For stories, the definition of done is simple: it's done when the product owner says it is. That may horrify people who live and die by acceptance criteria, but the simple fact is that acceptance criteria are fluid. New acceptance criteria are often discovered after the story is committed, and although most of these should spur the creation of a new story, more likely is that they simply get added to the existing one. And sometimes (not often enough), the product owner decides that the story is “good enough.”

At the task level, however, there are no acceptance criteria. Most teams that I've seen have worked out some measure of doneness that involves test coverage and code reviews. But the problem with such criterial is that they don't actually speak to what the task is trying to accomplish. The code for a task could be 100% covered and peer reviewed, but contribute nothing to the product. I think this is especially likely in teams where individual members go off and work on their own tasks, because peer reviews in that situation tend to be technical rather than holistic. As long as the code looks right, it gets a pass.

In my experience, the last chance for holistic review is the sprint planning meeting. Unfortunately, by the time the team gets to tasking, it's often late in the day and everyone wants to go home. But I've found that by simply asking “how do you plan to test this,” the task descriptions get more exact, and — not surprisingly — the time estimates go up.

Saturday, August 6, 2011

The Horizontal Slice

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

That's one of the basic principles of the Agile Manifesto, and a common approach to satisfying it is the “horizontal slice&rdquo: a complete application, which takes its inputs from real sources and produces outputs that are consumed by real destinations. The application starts life as a bare skeleton, and each release cycle adds functionality.

In theory, at least, there are a lot of benefits to this approach. First and foremost is the “for tomorrow we ship” ethos that a partially-functioning application is better than no application at all. Second, it allows the team to work out the internal structure of the application, avoiding the “oops!” that usually accompanies integration of components developed in isolation. And not least, it keeps the entire team engaged: there's enough work for everyone, without stepping on each others' toes.

But after two recent green-field projects that used this approach, I think there are some drawbacks that outweigh these benefits.

The first is an over-reliance on those “real” sources and sinks; the development team is stuck if they become unavailable. And this happens a lot in a typical development or integration environment, because other teams are doing the same thing. Developing mock implementations is one way to avoid this problem, but convincing a product owner to spend time on mocks when real data is available is an exercise in futility.

The second problem is that software development proceeds in a quantum fashion. I've written about this with regards to unit testing, but it applies even more to complete projects. There's a lot of groundwork that's needed to make a real-world application. Days, perhaps weeks, go by without anything that could be called “functional”; everything is run from JUnit. And then, suddenly, the's a main(), and the application itself exists. Forcing this process into a two-week sprint cycle encourages programmers to hack together whatever is needed to make a demo, without concern for the long term.

And that results in the third problem — and in my opinion the worst: high coupling between components. When you develop a horizontal slice, I think there's less incentive to focus on unit tests, and more to focus on end-to-end tests. After all, that's how you're being judged, and if you get the same level of coverage what does it matter?

On the surface, that's a reasonable argument, but unit tests and integration tests have different goals: the latter test functionality, the former lead you to a better design. If you don't have to test your classes in isolation, it's all to easy to rely on services provided by other parts of the application. The result is a barrier to long-term maintenance, which is where most of a team's development effort is spent.

So is there a solution? The best that I can think of is working backwards: creating a module at a time, that produces real, consumable outputs from mock inputs. These modules don't have to be full-featured, and if fact shouldn't be: the goal is to get something that is well-designed. I think that working backwards gives you a much better design than working forwards because at every stage you know what the downstream stage needs, even if those needs change.

I want to say again that this approach is only for building on the green field. To maintain the building metaphor, it's establishing a foundation for the complete system, on which you add stories (pun intended).

Friday, September 17, 2010

Project Management

I recently had a revelation: the first time that I worked with a Project Manager — a person whose sole role is maintaining a schedule and coordinating the tasks on that schedule — was 2002. For nearly 20 years of my career, I worked on teams where project management was a subsidiary role of the team lead or development manager. True, my career has mostly been spent at small companies, some that couldn't afford a dedicated project manager. But there were also a few larger ones — including GE, which you'd expect to be a bastion of project management and rigorous checklist checkers.

Before continuing, I want to say that, unlike many developers, I don't disdain project management per se. I've worked on projects that have succeeded (or at least failed less badly) because a talented project manager pulled together people with diverging goals, people who might have otherwise ignored or actively undercut one-another. I've also worked on projects where the project manager seemed to be actively inflaming the participants. Either way, it's a role with impact, one that cannot be ignored.

So why did I spend two thirds of my career without every seeing a project manager? I think the answer is that the structure of software development organizations changed over that time, along with the companies where they reside. And that's not necessarily a Good Thing.

But first, a little history. Corporate management, as we know it today, didn't exist before the mid-1800s. Prior to that time, business were small and generally confined to a single location; a few hundred employees was an industrial giant. The railroads changed all that: they hired thousands of employees, for a myriad of functions, and those employees were dispersed across the thousands of miles of terrain served by the railroad.

Up to that point, management relied on instant, face-to-face communication between front office and factory floor. This simply was not going to work for the railroads. In response, they adopted and adapted the hierarchical structure of the military, and even some of its terminology. The corporation was now composed of semi-autonomous divisions, which took strategic direction from the home office, but had freedom in tactical operations. Each division had its own complement of functional organizations such as maintenance shops, and those functional organizations kept largely to themselves.

This model worked well for the railroads, and for the giant industrial corporations that followed. You can even see the functional structure embodied in the layout of a manufacturing plant. And it permeated the thinking of the people working for those corporations: at GE in the late 1980s I received a five minute dressing-down from a mid-level manager, for daring to use a photocopy machine that belonged to his group. Even in the software industry, the hierarchical mindset prevailed: as you read The Mythical Man-Month, you won't find “project” managers, just managers.

So where do these project managers come from? I think the answer is construction.

Whether you hire a general contractor for your home remodel, or Bechtel for a billion dollar highway project, you get a project manager. And they're necessary: the construction industry is fragmented into dozens different trades and specialties within trades, even at the level of home repair. Carpenters, electricians, plumbers, masons, sheetrock installers, painters, tilers, landscape designers, and so on … you need all of them, and none of them do the other's job. And more important, each works for only a small part of the project schedule, and then they're gone. And if they don't start at exactly the right time, the whole project gets delayed.

It works for construction, so why not software?

In the 1980s and 1990s, corporations started to adopt “matrix management.” The reason was simple economics: self-contained organizations waste money. Just as you wouldn't want to pay a sheetrock crew to sit idle while the carpenters are building stud walls, most organizations don't want to pay a DBA to sit idle while the developers write front-end code. So the DBA team gets matrixed to the project team: when the project team needs a DBA, one will be assigned.

From the company's perspective, this maximizes employee utilization. And from the DBA's perspective, it's a better career path: rather than being isolated in a product-specific development, she gets to work with her peers and have her work recognized by a manager who doesn't think that mauve databases have more RAM. Everybody wins.

But something I noticed, when working with matrix organizations, is that you could never find a DBA when you needed one — or, as matrix management spread, any other specialist. They always seemed to have other projects demanding their time. Perhaps that was really true: for a corporation wanting to reduce costs, why stop with sharing people, why not understaff as well? But I also noticed that you could always get a DBA to turn up for meetings where there were donuts present.

And what I inferred from this is that matrix management creates a disincentive to project loyalty. After all, the specialist career path depends more on pleasing the specialist manager than the project lead. In the best of cases, specialists can cherry-pick projects that catch their interest, ignoring the rest. In the worst cases, there are lots of places to hide in a matrix organization.

This effect goes deeper than a few DBA's, however. In a fully-matrixed organization, project teams are ad hoc. You no longer have developers who are working on a product, they work on a project. And when it's done — or fails — they move on to another project. Taking with them the in-depth knowledge of what they did and what they should have done. Long term loyalty simply doesn't exist.

And with the creation of ad hoc teams, you need an ad hoc manager: the project manager. So to reiterate, it's not that project managers are bad per se, it's what their presence says about the organization that disturbs me.

Wednesday, August 11, 2010

Agile Isn't New

I recently read C. A. R. Hoare's 1980 ACM Turing Award speech, “The Emperor's Old Clothes” (currently downloadable here). The theme of this speech is simplicity, in particular how lack of simplicity in a programming language makes it harder to write error free code — summarized as “so simple that there are obviously no deficiencies [versus] so complicated that there are no obvious deficiencies” (emphasis as written). This, of course, resonates with my feelings about mental models.

About midway through the speech, Hoare describes a failed project: a new operating system that was to dramatically extend the capabilities of his company's former offering. It reads like a recap of The Mythical Man-Month, right down to the programmers' assumption that memory was infinite. But where Brooks turned to organizational strategies to dig his team out from failure, Hoare did something else:

First, we classified our […] customers into groups […] We assigned to each group of customers a small team of programmers and told the team leader to visit the customers to find out what they wanted […] In no case would we consider a request for a feature that would take more than three months to implement and deliver […] Above all, I did not allow anything to be done which I did not myself understand.

That quote could have come from a book on Extreme Programming. Short iterations, understandable stories, pulling the customer into the development process. It's all there.

Or, I should say, it was all there. In 1965. Presented to a group of practicing programmers in 1980. And then “rediscovered” by Beck, Jeffries, et al in the 1990s.

Why do we keep forgetting?

Thursday, June 17, 2010

Prototypes

In late January, 1984, I wrote about a dozen lines of Macintosh Pascal code. It drew series of slightly offset circles, creating what appeared to be a maze of connected pipes. That program, unchanged except for a pretentious and wholly inaccurate name given by a marketroid, went on to be the main demo program (and box artwork) for the product. And, barely a month into my first full-time programming job, I had learned the most important lesson of my career:

There's no such thing as a prototype

That utility that you wrote after having a couple of beers at lunch? It's going to be the “power user's” main tool for accessing your application. Expanded, of course, possibly beyond recognition. But deep within will be your drunken code and off-color variable names. And more important, your name will be on the check-in, so you'll always get the support calls — even for the parts that you didn't write. Worse, this code is a broken window, attracting more bad code until there's nothing left but a mess.

Most programmers learn this lesson at some point in their career. As a result, they either hide their throwaway code or make sure that someone else gets the blame for it. Newbie programmers haven't figured it out yet, and will give you a lot of attitude if you suggest taking care when writing such code. Newbie programmers that get promoted to management are the worst: they're the ones who decide that throwaway code should be part of the product.

All of which contributes to the main reason that I like Agile methodologies: they don't leave much room for second-rate code. For one thing, if you write all code test-first, even “spike solutions,” then you have at least some assurance of quality; less if you write tests just to achieve a set coverage metric, more if you use your tests as an opportunity to think about your design.

But you can write tests in any environment. Where Agile stands out is in its use of a backlog for all work, and its attitude that “tomorrow we ship.”

The former acts as a restraint on management: sure, that utility program would be useful to a large audience, but before it becomes part of the product it has to go into the backlog. And prioritized against all other feature requests. And estimated, because once it becomes a real feature it will have to do more than forble the frobulator.

The second point — that an Agile product should always be ready to ship — acts as a restraint on programmers, albeit in a counter-intuitive way. In my opinion, the root of almost all bad code is the feeling that “we gotta get it done”: there's a deadline, there's no time to do it right, so slap something together and hope it works. Agile would seem to encourage this behavior with its short cycle times, but all Agile methodologies include the fallback of “we mis-estimated, this has to be pushed to the next cycle.” If you can deliver three good features instead of four shoddy ones, that's a Good Thing, and it keeps “prototype” code at bay.

Unfortunately, there are a lot of companies that want to adopt Agile processes but can't let go of their hard deadlines and “required” feature lists. But that's a topic for another post.