Thursday, April 30, 2009

What Mercurial Can't Do: Merge by Changeset

UPDATE 12/1/2011: Mercurial CAN do this!  You used to need to enable the transplant extension.  But since v2.0 it's included out of the box in the form of the graft command.  I should also note that the problems I mention with this approach in TFS do not exist in the Mercurial.  So, basically, go use the graft command and don't read this old post (unless you want to read about how bad TFS is).


Mercurial is a distributed source control system that works very well on Windows and has great windows shell integration with TortoiseHg.

TFS is a centralized behemoth that does source control but also integrates (usually poorly) with every product Microsoft has ever released (not including Bob).

I've used both of these, but I've used TFS much more extensively. I recently started looking into what it would take to switch from TFS to Mercurial and was rather surprised to find a couple things that TFS can do that Mercurial cannot.

The first of these is the ability to do a merge by changeset. In TFS, say you create some branches as follows:
  1. Create a new TFS project called Project
  2. Check in some source at $\Project\Source
  3. Branch that source to $\Project\NewBranch
  4. Do 3 checkins to $\Project\NewBranch
Suppose you're not completely done with whatever it is you're working on so you can't merge all three changesets back to $\Project\Source just yet. But lets say your second changeset includes a bug fix to a file that you wont be changing as part of your "main" work in the other changesets. Maybe you want to go ahead and merge this bugfix back to $\Project\Source right now.

In TFS you could do this very easily.
  1. In Source Control Explorer, go to $\Project\NewBranch
  2. Right click and select "merge"
  3. Change to the "Merge by selected changeset" radio button
  4. Make sure the target is $\Project\Source
  5. Click next
  6. On the next page, select only the second changeset
  7. Click next and the merge is performed
Ta-Da! You just merged changeset #2 without merging changeset #1 or #3.

You cannot do this in Mercurial, at least, not with a "merge" operation. The only way to accomplish the same type of thing would be to create a patch out of NewBranch and apply it to Source using the hg export and hg import commands.

So the big question is, why can't you do this in Mercurial? The answer goes to the heart of what makes Mercurial so different from TFS. The first thing to realize is that Mercurial does not have "branch lines."

In TFS when you branch code TFS knows that the one is a parent of the other and you can only merge across that branch line. This means you can't do merges between two siblings. For example, in B -> A <- C, you can't merge B to C. You can only merge along the branch lines (Unless you do a baseless merge, which doesn't really count). In Mercurial, the normal way of creating a branch is to simply clone the repository, which means you have a full copy of the entire history of the repo. The image to the right shows what an hg pull would look like when bringing changes from a cloned repository into the original repository.

"A" represents the starting point. "B" represents the first change from the repository we're pulling in. What you can see here is that when you pull changes from mercurial, a temporary "branch" is created containing all the changesets from NewProject in parallel with any changesets from Source.

"C" is the only actual merge, in the tranditional way we think of merging, because it actually brings all the changes together. This is beautiful in its simplicity because until you get to C you don't have to do any work. Each changeset represents what changed from the parent, so you just import all the changesets and associate them with the correct parent. Then only at the very end do you have to do any merges.

A merge in TFS is not so clever. Basically all TFS does is figure out every file that changed on either side, and do a 3-way merge on each in turn, resulting in a new changeset. The upside of this is we can select a single changeset, ignore all the changesets around it, and do a merge.

In Mercurial, you can't pick just one changeset and merge. You have to merge all the changesets before it too because that's the definition of how a merge works in Mercurial. The upside of this is that all the changesets are preserved.

For example, say you want to know who added a certain file. In mercurial, you'll be able to figure this out regardless of what "branch" (cloned repository) it was added in. In TFS, you're screwed because the file will be added in a "merge" changeset. The merge may not have been done by the same person who added the file (in fact, it usually wont be), so to find out who added it, you have to manually follow the branches and inspect the history on each in turn. The same is true (and worse, actually) if you want to know who updated a line in a certain file.

Sadly for me, we're constantly "de-tangling" our changes by doing merges by changeset. But lets think about that for a minute. Is merging a single changeset even a sane thing to do? It turns out, not so much, because its possible for this to result in a broken state. Here's how:
  1. Joe Bob adds a new file "hippo.cs" and updates the C# project file
  2. Joe Smith adds a different new file "giraffe.cs" and updates the C# project file
  3. Joe Smith merges his changeset and ONLY his changeset up
  4. The result of the merge does not compile. The error is, "Can not find file "hippo.cs"
WTF!? Why is it looking for hippo.cs?! We didn't merge that changeset! How does it even know about hippo.cs? Its because Joe Bob and Joe Smith both changed the C# project file. When Joe Smith changed it, Joe Bob's changes were already in it. So Joe Smith's final C# project file includes Joe Bob's "hippo.cs" file. But when Joe Smith did a merge, he didn't include Joe Bob's changeset, so the hippo.cs file didn't get merged. And now you're broken.

This happens like all the freaking time with project files (which are the bane of branches and merges). But fortunately it's easy to fix. Just remove the missing files from the project file. But I think you can probably see that if this happened to any file other than the project file, like a real source file, you'd be in a world of hurt.

I'm actually STUNNED, given how much effort TFS puts into protecting the users from themselves that it allows you to merge selected changesets in this way! But it does. And its a feature that Mercurial just can't match, even if it is a feature that can lead to trouble. But maybe that's a good thing.

Tuesday, April 28, 2009

Merging with TFS

We celebrated "TFS Sucks" day at my office last Friday, so this blog post is a little late.

I don't understand this behavior. I'm merging one branch back to another by selected changeset (so I that I can go through what's changed before I merge). I tell it to do the merge and it comes back with the "Resolve Conflicts" dialog. There was one file in the changeset, so I see one file.

According to the dialog: "Conflicting changes have been detected. To resolve conflicts, select items and click Resolve."

When I right click on it and Compare -> Source to Target... I don't see changes on both sides, I only see the changes I knew I was bringing in.

When I right click on it and Compare -> Source to Base... I see exactly the same changes.

When I right click on it and Compare -> Target to Base... my tool tells me there are no changes.

So, this means there are no changes in my "Target." If that's the case, what "Conflicting changes" have been detected? There is no conflict! Of course, when I tell it Auto Merge All, it succeeds, but why did I have to go through this step at all?

I'm using SourceGear's DiffMerge as my merge tool, maybe that has something to do with it?

Monday, April 27, 2009

Vim Update Many Files

Here's a really nice Vim trick I learned today and have already used twice.

I'll demonstrate it with an example. I wanted to go through all my .cs files and fix the formatting (they all had 4 spaces per tab and I wanted to update them to 2 spaces per tab).

First, I opened Vim and set the current directory to the root of my source folder.
:cd C:\blah\blah\source\code

Next, I told Vim to pull in all my .cs projects in the entire source tree.
:args .\**\*.cs

** means to go recursively down 30 directories (you can set the max depth, default is 30. Try :help ** for more).
Finally, I told it I wanted it to format each file, from beginning to end, using the "equalsprg" and then to save it.
:argdo exe "normal gg=G" | w

To break this down:
  • :argdo tells it to run the specified command on all "args", or all open files
  • exe tells it to execute the given command
  • normal runs the :normal command, which allows you to execute normal mode commands like motions, etc
  • gg goes to the top of the file, = formats the file, G goes to the end of the file
  • | chains commands together
  • w writes the file
Pretty awesome I thought.

Another common use of :argdo is to run a search and replace across many files. Open the files you want either with the :args command or from the command line. Then do:
:argdo %s/matchon/replacewith/ge | w


If this would require you to open a ridiculous number of files, most of which wouldn't have matches, then you should use the :vimgrep command instead. This will put only files that match your regex on the quickfix list. You can then move through them with :cnext and execute any commands you want. The downside here is that you have to execute your command over and over in each file.

Thursday, April 23, 2009

Word CommandBarButton Tag

I'm guessing that no one cares about this, but I just spent all day tracking down this issue and blogging about it is going to make me feel better.

I was adding buttons to the toolbar in Word through Visual Studio Tools for Office (VSTO). My situation had me adding buttons to certain open documents. I ran into a problem where after adding two (or more) buttons to two (or more) documents I started getting two click (or more) click events every time I clicked (just once!) on any of the buttons.

CommandBar cmdBar; // get this from somewhere
CommandBarButton cmdBarBtn = (CommandBarButton)cmdBar.Controls.Add( blah, blah, blah );
cmdBarBtn.Caption = "Example";
cmdBarBtn.Tag = "Example";
cmdBarBtn.Click += btn_Click;

Turns out, that Tag property is really really special. You expect the CommandBarButton object to pretty much take care of differentiating one button from another, but that's not how it works in Word.

Word uses the Tag to differentiate between all the various button instances. Even though executing the code above twice DEFINITELY creates two different button instances (I verified by changing the Caption on the second one), Word can't tell them apart because they have the same Tag. So, because you registered two click events on the same Tag, you get two click events to fire.

I changed my code to:
cmdBarBtn.Tag = Guid.NewGuid().ToString();

Now my clicks are properly associated with the button that was clicked.

Ahh... I do feel better.

The look and the feel

Those of you who read the random stuff I write here regularly will notice that I changed the blogger template to the real basic one. The reason is because this one stretches to fill the Window.

The length of my posts has been getting longer, and the old template constrained the post content to 460px. Which is just ridiculously small and made the long-ish posts I was writing look like bloody novels.

I may try to spend some time and create a very simple template all my own, but for now I think this one will do just fine. I mean, Steve Yegge uses it, so why can't I?

What are your thoughts on fixed width vs. variable width blogs? And what's an acceptable fixed width for a blog like this one?