kwblog: February 2008

Tuesday, February 26, 2008

Source Control Branching

Source Control systems have a number of purposes:

Track every change made to the source
Help multiple developers work together
Help multiple teams work in parallel

Different source control systems are better at each of these. For example, Microsoft's Source Safe works fine for #1 and #2, but sucks at #3. This is because it doesn't really have branching support.

Microsoft's TFS, on the other hand, is quite good at all these things. It is every bit as nice as Subversion or Perforce or Vault. This is mainly because it has real branching.

However, unless your company consists of only 2 or 3 people you can't just start randomly branching things. You need a basic plan of some sort. A kind of branching strategy.

There are lots of resources out there that provide this kind of information. One of the best I have found, which is specific to TFS, is the Microsoft Team Foundation Server Branching Guidance. There have also been articles in the IEEE about branching, such as The Importance of Branching Models in SCM. And of course there are many blogs which touch on the subject such as Top 10 Tips On Version Control for Small Agile Software Teams.

All of these resources are helpful, but what they indicate is that what kind of branching strategy you use is really dependent on the structure of your organization. There does not seem to be a one size fits all method. For example, at Microsoft "Feature Teams" reign, and therefore Feature Branches are the way to go. At other big companies the development teams may not even be aware of the branching strategy. They just work in the environment provided for them and other people promote their changes for them. Its also possible that small teams have no need for branching at all.

What I find fascinating in all of this though is the lack of any discussion about Shared Components. I've talked about this in past posts like Two Versions of the Same Shared Assembly. In that series I was trying to find a way to support two different versions of an assembly in the same "Solution" at the same time. Ultimately I determined that there is just no solid way of doing this that is worth the effort.

Now I'd like to look at how having Shared Components affects a branching strategy. All the reference's I've linked to so far only consider a single project at a time. They don't talk about how a project's dependencies on other projects should be handled in source control.

Let's start by looking at what we want to gain from branching:

Keep untested changes from being released

To clients
To dependent internal projects

Allow projects to decide when to make changes available to other projects
Allow projects to decide when to integrate changes from other projects

The first question: Should you depend on source (project references) or dlls?
The answer depends on your environment. If you need to be able to work on a dependent project and an application hand in hand, at the same time, you have to use source. If you want to be able to perform one step refactorings you also have to use source.

You might think that you could could switch between dlls and source as needed, but if you work out the details of this you'll see it doesn't work. Mainly because you end up working on a "release" source branch, which you shouldn't be making changes to.

So, if you have distinct teams that work on different projects with little overlap, you can probably reference dlls. But if you have people who work on a little of everything, you probably need to reference source.

The second question: What branching strategies should we consider?

On Demand Application Isolation: Applications branch their code and all dependent code into a TEST or RELEASE folder when they want to isolate themselves, or "freeze" their dependencies. When they are not frozen they simple reference the since source control store of their dependencies.
Team Promotion: Shared components do development work in a DEV branch and when the changes are ready for the wild they merge them into a PUBLIC branch. Applications reference their dependencies public branches.
Team Promotion/Application Isolation: Combination of the other two. Applications reference their dependencies' PUBLIC branches. Then branch everything into TEST and RELEASE.
Distributed Projects: Shared components have DEV and PUBLIC branches. Applications branch from PUBLIC into all of their branches, DEV, TEST, and RELEASE.
Distributed Individuals: Use a source control system like Mercurial, or Git...

Of all of these Distributed Individuals is the most intriguing and flexible. However, it can't be done with a standard centralized source control store, so I have to disregard it for now.

Distributed Projects is the next most flexible. In this model Applications decide when to merge changes from the projects they depend on into their own branches of those projects. Then those applications push changes from DEV -> TEST -> RELEASES. The down side of this model is the huge number of branches and the long path to push changes from Module DEV -> Module PUBLIC -> App DEV -> App TEST -> App Releases and back the other way.

Because Distributed Projects has so much overhead I prefer the Team Promotion/Application Isolation model. This model is practically the same. It only differs in that the Application's DEV branch doesn't contain dependency branches. It references them directly. The downside of that is that the Application can't decide when to integrate changes from dependencies into DEV. However, it still decides when to integrate into TEST.

Note that all of these models would still take advantage of Feature Branches when it was helpful.

Hopefully this has made some sense to you... I was strongly tempted to create some diagrams to go along with all of this and make it easier to comprehend, but I simply didn't have time.

Do you work on projects that employ branching? Do you have to deal with shared components?

Thursday, February 21, 2008

Restore the Master Boot Record

When I got my laptop about 1.5 years ago one of the first things I did was install Linux on it. I can't remember now if I installed SUSE first, or went straight to Ubuntu. I'd been through Slackware, Debian, Suse, OpenSuse, SLED10, and Ubuntu in the past on other computers. In any event, I stopped with Ubuntu.

I've always had a few reasons for playing with linux

To learn
To feel cool, in a super nerd kind of way
To not pay for software

Probably more or less in that order. My adventures with Linux ultimately ended with me deciding that I just wasn't that impressed. I wont go into the details here, because that is not what this post is about. The short list is:

the file system is stupid (unless you're using GoboLinux),
package managers are a double edged sword,
the battery life on my Thinkpad in Linux sucked,
it didn't have any apps that were far and away better than what I was using in Windows.

Keep in mind, I spent about 4 or 5 years coming to this conclusion, and I frequently still find myself drawn to the Open Source world. And for what it's worth, I don't personally view myself as a Microsoft fan boy. In any event, I retain the right to change my mind at any point and for any reason!

So I've had Ubuntu installed on my laptop, but I haven't used it in a year. Every now and then I would use it to pull songs off an iPod, that's about it. So it was just sitting there wasting 20Gb of my Hard Drive. This post is about how I removed it and restored the MBR. (really? you'd better get to the point then!)

Problem #1: Get rid of GRUB and replace it with the Windows boot loader

I'm pretty sure you do have to start here. If you remove the partitions first, GRUB will complain and refuse to let you boot into the remaining partitions... That happened to me once before, it wasn't nice.

How do you do this? It's easy, insert a Win XP disk and boot into the Recovery Console. Once there, execute fixmbr. When it warns you that "all hell will break loose, are you sure you know what you're doing?", tell it yes.

Problem #2: Umm... The Recovery Console wants an Administrator password. Leaving it blank doesn't work, and none of my passwords work... How do I get into the Recovery Console?

The first thing I tried was to go into Control Panel -> Performance and Maintenance -> Administrative Tools -> Computer Management -> Local Users and Groups -> Users. Once there, I right clicked on the Administrator and selected "Set Password..." then entered a password. Sadly, after rebooting back into the Recovery Console (and waiting forEVER while it loads...) it still wouldn't take my password.

So how do you fix it? Go back to Administrative Tools and this time go to Local Security Policy -> Local Policies -> Security Options. In the list, scroll to the bottom and find "Recovery Console: Allow automatic administrative logon." Double click and switch it to Enabled. Now you wont have to supply a password to get into the Recovery Console and you'll be good to go.

Problem #3: Remove the linux partitions and resize the windows partition.

I tried to use Partition Magic to do this, but it blew up with an error when I ran it, something about how some partition didn't have a drive letter. I think it was afraid of the IBM recovery partition.

How do you do this? I downloaded an Ubuntu install/live CD and used the gnome partition manager (gparted). First unlock the swap partition. This crashed gparted... But when I ran it again the unlock had succeeded, so everything seemed good. Then I deleted the swap and the linux partitions. Then I resized my windows data partition (fat32). Click Apply and go take a shower cause its gonna take a while.

After all that, I now have my drive space back and GRUB is gone. I still have two windows partitions C (ntfs) and D (fat32)... I wish I could convert D to ntfs but the only way I know to do that would be to copy all the data on D to some other drive, format D to ntfs, then copy all the data back. Maybe I'll do that when I get my backup solution in place. Actually, I'd like to get rid of D altogether. It made sense to have an applications partition and a data partition when I wasn't backing up my drive. But once I have a backup I'm not sure I still need two partitions.

How do you have your drive space setup?

Wednesday, February 20, 2008

.NET structs

.NET has structs and classes. Structs are allocated on the stack instead of the heap and can't participate in inheritance. However, because they're on the stack, they are faster. Check out this article on MSDN for some details, scroll down to the ValueType section. As it says there,

The flexibility afforded by objects comes at a small performance price. Heap-managed objects take more time to allocate, access, and update than stack-managed ones. This is why, for example, a struct in C++ much more efficient than an object. Of course, objects can do things that structs can't, and are far more versatile.
But sometimes you don't need all that flexibility. Sometimes you want something as simple as a struct, and you don't want to pay the performance cost. The CLR provides you with the ability to specify what is called a ValueType, and at compile time this is treated just like a struct. ValueTypes are managed by the stack, and provide you with all the speed of a struct. As expected, they also come with the limited flexibility of structs (there is no inheritance, for example). But for the instances where all you need is a struct, ValueTypes provide an incredible speed boost. More detailed information about ValueTypes, and the rest of the CLR type system, is available on the MSDN Library.

So, it would seem, if you found yourself doing some very C-like tree or list type of coding, you should use a struct instead of a class.

So it would seem. Unfortunately, it turns out this doesn't really work out so well for one very simple reason: structs are Value Types!

Take a look at these two code samples. I'll tell you in advance, the first one fails, the second one passes.

struct Node
{
public int Value;
public string DisplayValue;
}

void Test()
{
List l = new List();
Node n;
n.Value = 5;
n.DisplayValue = "display this";

l.Add( n );

n.DisplayValue = "Display This";

Debug.Assert( n.DisplayValue == l[0].DisplayValue );
}

That one fails.

class Node
{
public int Value;
public string DisplayValue;
}

void Test()
{
List l = new List();
Node n = new Node();
n.Value = 5;
n.DisplayValue = "display this";

l.Add( n );

n.DisplayValue = "Display This";

Debug.Assert( n.DisplayValue == l[0].DisplayValue );
}

That one passes.

The first one fails because "Display This" is not equal to "display this". That is, l[0].DisplayValue returned "display this", not "Display This" as you might have expected.

What went wrong? Its simple. Structs are value types. You passed your struct into the List's Add method as a parameter. Parameters are passed by value. Therefore value types get copied when they are passed as parameters. With reference types the underlying pointer is copied, but the object is not. Thus the first code sample copies the Node and the second code sample doesn't.

update: I had stacks and heaps backwards in the first paragraph... So I flipped them.

What you need is a pointer to the struct. But C# doesn't have pointers (well, not really), so you have to use classes instead. Because of this, I think you'll find that the utility of structs is pretty limited. That probably explains why you never see anybody using them.

Monday, February 18, 2008

The Elements of Design

In the world of software design, there are 3 distinct classes of "pattern" to be aware of:

Smells
Patterns
Principles

The focus of each of these is primarily on enabling code to be changed and understood.

Smells are symptoms that are frequently demonstrated by code which has not been well designed. These are good to be aware of as they indicate you should probably consider doing something to fix the code so it doesn't smell so bad anymore.

Patterns are common OO design techniques for accomplishing certain behaviors. They were originally captured in the "Design Patterns" book by the "Gang of Four." These are very helpful, less for how they can improve your code, and more as a communication technique between developers. Personally I've found Singleton, Strategy/State, and Mediator to be the most far reaching. These are about as close as Computer Science has come to having terminology that can be used to describe abstract code design concepts. In general, you might fix a code smell with a Design Pattern.

Principles are a cross between smells and patterns except instead of telling you what is bad they tell you what is good, and instead of giving you precise designs to accomplish tasks they tell you what, in general, good looks like in Object Oriented languages. Of all of the three classes, Principles are probably the most useful because they describe design in general. Interestingly, they seem to be less known. Probably because they are both newer and slightly harder to understand. Of these, I have found the Single Responsibility Pattern and the Dependency Inversion Principle to be the most far reaching.

Thursday, February 14, 2008

Goz-Inta

I've referred to one of my college professors in past posts. He had a way of presenting lessons in simple packages. One of these was a word he used: Goz-Inta.

He said, "Many new programmers have the Goz-Inta problem, they don't know what should Goz-Inta what!" New programmers write programs that consist of one gigantic function. It doesn't occur to them that some of that code should goz-inta separate functions. They don't have any experience with reasoning about what should goz-inta what. And more than that, it doesn't even seem to be an important issue. After all, working programs is the most important thing, right? HA!

From the time that I emerged out of the "new programmer" phase, I have found that all the major problems of "Software Architecture" can be reduced to questions of Goz-Inta. People seem to progress through the following stages:

1 big function
1 big class
1 big assembly
1 source control branch (as in, no branches)

All of these can be expressed in terms of Goz-Inta.

And that's not all! Take a look at design smells/patterns/principles. What are they all doing? They're either describing symptoms of Goz-Inta mistakes, common Goz-Inta techniques, or big picture Goz-Inta concepts. GOZ-INTA.

Ever heard people say that its important to name your methods and classes well? That's because a good name helps the author remember what should and shouldn't goz-inta those methods and classes. A good name also helps people using the methods and classes understand what's gonez-inta them.

Why do I think that unit testing helps you design better code? Its because unit testing forces you to think about the Goz-Inta problem.

Furthermore, in my experience, the best programmers are the ones who are the best at working through the trade offs created by the Goz-Inta problem.

Thursday, February 7, 2008

"This is a Capacitor"

In college I took a Physics II course. I had been looking forward to that class since early High School. It was basically an introduction to circuits. I was eager to learn about resisters and capacitors and voltage and how circuits were built and designed. I thought I'd gain some knowledge that I'd be able to use in everyday life.

You wont be surprised to hear that I didn't gain any of that kind of knowledge. Sadly, I doubt I retained any knowledge from that course whatsoever. In truth, I barely made it through! Of course, it was one of those classes where a 60% test score can set the curve...

Anyway, there was one lesson in that class that I will never forget. The professor started class and turned to the board and said,
"This is a capacitor."

Then he wrote "C=Q/V" on the black board.

Yep, those 3 letters and 2 symbols are definitely a capacitor...

Now, maybe its just me, but when someone says "This is a capacitor," I sort of expect to glance up and see him holding one, or displaying a picture of one. I don't expect to see an equation. We never talked about what a capacitor DID, or why you might NEED one. In fact, it wasn't until I had a phone conversation with my dad that I learned a capacitor "stored charge". And I'm not even going to get into the fact that C=Q/V describes capacitance, which is a measurable quantity, and not a capacitor which is a physical device...

So what lesson did I learn from this? Different people learn and think in different ways. I'm a practical applied type of learner. That is, I need to understand and not just memorize. Thus giving me an equation or showing me a graph just isn't quite enough. I need to see steps and logic and explanation. In other words, I have to start from something I know and build up.

Other people certainly aren't this way. I've known lots of people who could simply see an equation and understand. I've known other people who could see an equation and then spit back and pass the class, but maybe not understand.

Fortunately for me, my learning style works great for computer science. Unfortunately for me, it's not how Math is taught. And only rarely how Physics is taught.

I've always believed that calculus and physics should be taught hand in hand. After all, the majority of calculus was developed trying to solve problems in physics. Add in a dash of History and you've got a thoroughly engrossing class in my opinion. But anyone who's been through college knows that isn't how its done most places.

Instead physics is taught with algebra, because the kids don't know calculus, even if they've taken it. And that means you have to memorize and spit back more equations for more circumstances. If it was calculus the number of equations would drop and the amount of logic to handle different circumstance would rise. Great for me! But again, not how its typically done.

Given all this, I guess I can't really blame my Physics II teacher. He was just teaching the way he thinks. Turns out a lot of people think and learn that way just fine. Even the Wikipedia article on Capacitance is written in much the same way. Of course, Wikipedia makes a distinction between a Capacitor and Capacitance and does a great job of describing Capacitors in English...

Still, I've never been convinced that people who were able to learn the equation and solve the "word problems" and ace the tests really understood any of it. Some of them did. But most of them? I find it hard to believe.

I have to end this by pointing out that there are certain topics to which my style of thinking and learning just doesn't help. Things like Quantum physics, for example. Where no matter how you look at it, you can't reduce it to common sense. In these circumstances you have to settle for memorizing the "way things are". Chemistry is like this. And yes, lots of Physics is like this too. But I still claim that after you've internalized the "way things are", you can start to reason logically about it.