blog.kdgregory.com: February 2012

Thursday, February 23, 2012

What Does Namespace URI Tell You?

When working with XML, you'll often see namespace definitions like this, and wonder what significance the “http” URI might have.

<beans xmlns="http://www.springframework.org/schema/beans"
             xmlns:int="http://www.springframework.org/schema/integration"
             xmlns:ctx="http://www.springframework.org/schema/context"
             …

The answer: absolutely none. It's simply a string that the parser associates with each namespaced element or attribute. As far as the parser is concerned, it could be “foo”.

I can already hear people saying “no, you're wrong! put that URL into your browser!” And, in the case of Spring namespaces, the URL will take you to a page where you can see the relevant schemas. But this is simply a convenience provided by SpringSource, a way to make developers' lives easier.

It's not mandated by specification, nor suggested by convention. And most of us don't have the control over our corporate website necessary to make it happen. If you do, great; if not, then don't feel that you need to use a “http” namespace URL.

I personally prefer a URN — a Uniform Resource Name, rather than a Uniform Resource Locator — because it doesn't give the impression that there might be something that could be located.

The rules for constructing a URN are simple:

<URN> ::= "urn:" <NID> ":" <NSS>

The NID portion is the “namespace identifier,” and there are also rules for registering your namespace with the Internet Assigned Numbers Authority. But those same rules allow “experimental” namespaces, which are anything starting with “X-”. So you can construct perfectly legal URNs using your company's domain name (reversed here, in keeping with the Java package naming convention).

urn:X-com.mycompany:NSS

The NSS portion is the “namespace specific string,” and can be anything you want. If you have corporate coding standards, you can follow them; otherwise, something like “PROJECT:PURPOSE” works well:

urn:X-com.mycompany:frobulator:schemas

And that's it. You have a namespace URI that won't confuse anybody as to its purpose: it's used only to uniquely group elements and attributes.

Friday, February 17, 2012

Setting up a Subversion Repository for a Small Team

Most organizations that use Subversion for source control set up an Apache server in front of the repository. For a small team, this can be daunting, especially if nobody on the team has experience running and configuring Apache. Even in a larger organization, with experienced Apache admins, I question the benefit. Especially if you rely on Apache's built-in authentication mechanism, which involves keeping a private list of Subversion users, separate from your organization's main user list. One that's unlikely to get updated every time someone joins or leaves the organization.

There is an alternative: Subversion comes with the svnserve server, which listens on port 3690, and speaks it's own protocol with Subversion clients. It's deceptively easy to use:

# on the server
svnserve -d -r /opt/svn/repo/

# and on the client ...
svn co svn://localhost workdir

But there are several problems with running svnserve like this. First, you manually started the server, and you have to restart it if the system ever goes down. While you can configure inetd to automatically start the server, that adds a task for your sysadmins. And unless your developers are also your sysadmins, this will take time and may be a point of contention.

A bigger problem is that running svnserve in this way means that all commits are anonymous. The output of svn annotate will be meaningless, but worse, anybody that can physically access your repository will be able to update it. Clearly, you wouldn't want to expose this repo through the corporate firewall. And while svnserve provides several options for authentication, they take you back to a second user database.

But there's an alternative: identify the repository using a svn+ssh URL. This tells a client to open an SSH connection to the server, and run svnserve over that connection.

svn co svn+ssh://localhost/opt/svn/repo workdir

There is a lot to recommend this approach. First, it uses the server's own user database for authentication and authorization; no more anonymous commits. Second, system administrators are a lot more willing to open the SSH port to the outside world, rather than some custom server that has unknown security holes. Third, an experienced user can set up public key authentication, making the whole login process transparent.

The biggest drawback to running Subversion in this way is that, because each user connects as him or herself, all users need to be able to physically write files in the Subversion repository. In a small team, where everybody knows everybody else, I don't see this as a big issue. However, if you're concerned about security, you can take the following steps:

Use a dedicated host for the Subversion repository, and configure SSH to limit “ordinary” users to only run svnserve (you'll still need full access for admins).
Create a “subversion” user group — or, for larger teams, a project group “ and assign it to the people who are to have access.

Once you've created the repository, you need to grant write permissions on some of the files in it. Assuming that you're creating repositories in the /opt/svn directory, and have a subversion group, here are the commands to set up your repository (run by root):

cd /opt/svn
svnadmin create repo

# there are some files in the repository that get created from the first commit;
# having root create the standard trunk/tags/branches layout gets you that commit

svn co file://`pwd`/repo working
cd working
svn mkdir trunk
svn mkdir tags
svn mkdir branches
svn commit -m "create initial structure"
cd ..
rm -rf working

# now we can modify the actual repository 

chgrp -R subversion repo
cd repo/db
chmod g+w .
chmod -R g+w current min-unpacked-rev rep-cache.db revprops revs transactions txn-* write-lock

If you've changed all the files correctly, running ls -al on the /opt/svn/repo/db directory should show you this:

total 56
drwxrwsr-x 6 root subversion 4096 Feb 11 22:02 .
drwxr-xr-x 6 root subversion 4096 Feb 11 21:53 ..
-rw-rw-r-- 1 u1   subversion    2 Feb 11 22:02 current
-r--r--r-- 1 root subversion   22 Feb 11 21:53 format
-rw-r--r-- 1 root subversion 1920 Feb 11 21:53 fsfs.conf
-rw-r--r-- 1 root subversion    5 Feb 11 21:53 fs-type
-rw-rw-r-- 1 root subversion    2 Feb 11 21:53 min-unpacked-rev
-rw-rw-r-- 1 root subversion 4096 Feb 11 21:54 rep-cache.db
drwxrwsr-x 3 root subversion 4096 Feb 11 21:53 revprops
drwxrwsr-x 3 root subversion 4096 Feb 11 21:53 revs
drwxrwsr-x 2 root subversion 4096 Feb 11 22:02 transactions
-rw-rw-r-- 1 u1   subversion    2 Feb 11 22:02 txn-current
-rw-rw-r-- 1 root subversion    0 Feb 11 21:53 txn-current-lock
drwxrwsr-x 2 root subversion 4096 Feb 11 22:02 txn-protorevs
-rw-r--r-- 1 root subversion   37 Feb 11 21:53 uuid
-rw-rw-r-- 1 root subversion    0 Feb 11 21:53 write-lock

Wednesday, February 15, 2012

The Role of a Manager

Joel Spolsky recently wrote a guest post for the VC funding StackExchange. It's similar to many of his posts on management, following the theme that management “exists to move the furniture around so that [the people with knowledge] can make the hard decisions.“

It's a theme that has been repeated many times, by many writers, and I generally agree with it. But Joel tries to support this theme with an example that I emphatically disagree with.

When two engineers get into an argument about whether to use one big Flash SSD drive or several small SSD drives, do you really think the CEO is going to know better than the two line engineers, who have just spent three days arguing and researching and testing?

The easy answer is “no, the CEO won't know better.” But that's not the point. If you have two engineers that can't come to a consensus after three days, then you need a CEO — or line manager — who can step in an break the logjam. Who can listen to the arguments and discern what actually matters to the business.

Or who can say “I can't tell the difference but we can't argue forever. I'll flip a coin and if we're wrong we'll revisit the decision.”

The best manager that I ever had was non-technical, and knew it. He also had a great bullshit detector, and generally knew when to let his subordinates fight. But when the time came to make a decision, he didn't hesitate to make one.

Too many managers don't have the skills to mediate between equally good options. Or they aren't willing to take responsibility for decisions. In which case, they serve no higher role than moving the furniture around. But the world doesn't need managers like that.

Tuesday, February 14, 2012

A Bash Function to Create Git Branches

I'm currently doing some work on our corporate website, working through a long list of generally minor issues and learning Rails in the process. I'm still at the point where I want to code-review every change, and I also have the philosophy that every issue should be independent. So I'm creating a lot of branches.

After accidentally creating a couple of those branches from other branches rather than master, I added the following function to my .bashrc:

function branch { 
    if git branch | grep -q " $1\$" ; then
        git checkout $1
    else
        git checkout master
        git checkout -b $1
        git push origin $1
    fi ; }

Invoke this as branch branch_name. If there's already a branch with that name, it runs git checkout to switch to that branch — which will fail if you have unstashed changes on the current branch.

Otherwise, it switches to master and then creates the new branch. In both the local and origin repository (we use GitHub at work, so everyone on a project has his or her own fork). If you don't have a similar setup, delete that push — or change it to point to a backup repository.

blog.kdgregory.com