Friday, May 1, 2009

What Mercurial Can't Do: Subtree Repos

On the Mercurial website, at the very bottom of this page is a section titled "What Mercurial can't do" which says it can't host related projects together in one repository. Then it links out to the Forest Extension which makes nesting repositories easier to work with (and which comes with TortoiseHg, you just have to turn it on).

Its true that Mercurial can't do this, not even with the Forest Extension. Apparently there are plans for Mercurial to build some kind of subtree repo support in the near future. However, I'm not sure what it will look like. It might not enable the behavior I'm going to discuss here. We'll have to wait and see.

To describe what this really means, here's a possible layout you might find in TFS:
$\
MathUtils\
GraphUtils\
ProjectA\
ProjectA\
MathUtils\
GraphUtils\

In this example MathUtils and GraphUtils are seperate projects which have been written to be reused by other projects. ProjectA is reusing both of them.

ProjectA could just reference $\MathUtils and $\GraphUtils directly. But if there are many projects reusing them and making changes to them that could get hairy. Every time someone in ProjectB changes MathUtils and checks in, they could potentially break ProjectA immediately. To avoid this, ProjectA wants to isolate itself from changes other people might be making to the shared utils. To do this, they create a branch of MathUtils and GraphUtils in their own project. Now they can decide when the right time to bring in changes from other people is, or when the right time to share their own changes with other people is.

To bring in changes from outside in TFS you'd right click on $\MathUtils and say merge, selecting $\ProjectA\MathUtils as the target. To push changes back in TFS you'd right click on $\ProjectA\MathUtils and say merge, selecting $\MathUtils as the target.

In TFS, if you made changes to ProjectA and MathUtils and checked in, you'd get a single checkin containing all changes. When you then merged MathUtils back, the changeset would basically be split so only the changes to $\ProjectA\MathUtils got merged to $\MathUtils (For more on how TFS does merges, check out this post What Mercurial Can't Do: Merge by Changeset). This works because TFS doesn't respect "changesets" across branches.

This does not work in Mercurial. You could setup the same directory tree structure, but you couldn't have just one repository. Instead, you'd have to have $\ProjectA\MathUtils and $\ProjectA\GraphUtils setup as their own repositories nested in the $\ProjectA repository. Mercurial is smart, and it wont try to check in MathUtils or GraphUtils to ProjectA because it recoginizes them to be distinct repos.

So this setup actually works just fine, but it's more work. In TFS, we could do one checkin. In Mercurial, we'll have to checkin changes to each repo individually. That's probably not such a big deal, but in TFS you didn't have to keep track of where you were making changes. In Mercurial you might forget you changed something in MathUtils, and then you might forget to check it in. This is because when you do an "hg status" you'll only see changes in ProjectA. This is where the Forest Extension comes in handy. It adds an "hg fstatus" command which is basically a recursive status command. That way you wont lose track of your changes, but you still have to commit them individually.

But lets think about this for a minute. Is it really a smart thing to do, having one changeset with changes that affect both ProjectA and MathUtils? What's your checkin comment likely to be? Probably something like "ProjectA's xxx feature now can whizbang." If you're looking at the change history for $\Project\MathUtils and you see "ProjectA's xxx feature now can whizbang" does that makes any sense? Does it tell you what changed in MathUtils?

So once again, this is something that Mercurial just can't do. But maybe that's a good thing.