kwblog

Thursday, February 21, 2008

Restore the Master Boot Record

When I got my laptop about 1.5 years ago one of the first things I did was install Linux on it. I can't remember now if I installed SUSE first, or went straight to Ubuntu. I'd been through Slackware, Debian, Suse, OpenSuse, SLED10, and Ubuntu in the past on other computers. In any event, I stopped with Ubuntu.

I've always had a few reasons for playing with linux

To learn
To feel cool, in a super nerd kind of way
To not pay for software

Probably more or less in that order. My adventures with Linux ultimately ended with me deciding that I just wasn't that impressed. I wont go into the details here, because that is not what this post is about. The short list is:

the file system is stupid (unless you're using GoboLinux),
package managers are a double edged sword,
the battery life on my Thinkpad in Linux sucked,
it didn't have any apps that were far and away better than what I was using in Windows.

Keep in mind, I spent about 4 or 5 years coming to this conclusion, and I frequently still find myself drawn to the Open Source world. And for what it's worth, I don't personally view myself as a Microsoft fan boy. In any event, I retain the right to change my mind at any point and for any reason!

So I've had Ubuntu installed on my laptop, but I haven't used it in a year. Every now and then I would use it to pull songs off an iPod, that's about it. So it was just sitting there wasting 20Gb of my Hard Drive. This post is about how I removed it and restored the MBR. (really? you'd better get to the point then!)

Problem #1: Get rid of GRUB and replace it with the Windows boot loader

I'm pretty sure you do have to start here. If you remove the partitions first, GRUB will complain and refuse to let you boot into the remaining partitions... That happened to me once before, it wasn't nice.

How do you do this? It's easy, insert a Win XP disk and boot into the Recovery Console. Once there, execute fixmbr. When it warns you that "all hell will break loose, are you sure you know what you're doing?", tell it yes.

Problem #2: Umm... The Recovery Console wants an Administrator password. Leaving it blank doesn't work, and none of my passwords work... How do I get into the Recovery Console?

The first thing I tried was to go into Control Panel -> Performance and Maintenance -> Administrative Tools -> Computer Management -> Local Users and Groups -> Users. Once there, I right clicked on the Administrator and selected "Set Password..." then entered a password. Sadly, after rebooting back into the Recovery Console (and waiting forEVER while it loads...) it still wouldn't take my password.

So how do you fix it? Go back to Administrative Tools and this time go to Local Security Policy -> Local Policies -> Security Options. In the list, scroll to the bottom and find "Recovery Console: Allow automatic administrative logon." Double click and switch it to Enabled. Now you wont have to supply a password to get into the Recovery Console and you'll be good to go.

Problem #3: Remove the linux partitions and resize the windows partition.

I tried to use Partition Magic to do this, but it blew up with an error when I ran it, something about how some partition didn't have a drive letter. I think it was afraid of the IBM recovery partition.

How do you do this? I downloaded an Ubuntu install/live CD and used the gnome partition manager (gparted). First unlock the swap partition. This crashed gparted... But when I ran it again the unlock had succeeded, so everything seemed good. Then I deleted the swap and the linux partitions. Then I resized my windows data partition (fat32). Click Apply and go take a shower cause its gonna take a while.

After all that, I now have my drive space back and GRUB is gone. I still have two windows partitions C (ntfs) and D (fat32)... I wish I could convert D to ntfs but the only way I know to do that would be to copy all the data on D to some other drive, format D to ntfs, then copy all the data back. Maybe I'll do that when I get my backup solution in place. Actually, I'd like to get rid of D altogether. It made sense to have an applications partition and a data partition when I wasn't backing up my drive. But once I have a backup I'm not sure I still need two partitions.

How do you have your drive space setup?

Wednesday, February 20, 2008

.NET structs

.NET has structs and classes. Structs are allocated on the stack instead of the heap and can't participate in inheritance. However, because they're on the stack, they are faster. Check out this article on MSDN for some details, scroll down to the ValueType section. As it says there,

The flexibility afforded by objects comes at a small performance price. Heap-managed objects take more time to allocate, access, and update than stack-managed ones. This is why, for example, a struct in C++ much more efficient than an object. Of course, objects can do things that structs can't, and are far more versatile.
But sometimes you don't need all that flexibility. Sometimes you want something as simple as a struct, and you don't want to pay the performance cost. The CLR provides you with the ability to specify what is called a ValueType, and at compile time this is treated just like a struct. ValueTypes are managed by the stack, and provide you with all the speed of a struct. As expected, they also come with the limited flexibility of structs (there is no inheritance, for example). But for the instances where all you need is a struct, ValueTypes provide an incredible speed boost. More detailed information about ValueTypes, and the rest of the CLR type system, is available on the MSDN Library.

So, it would seem, if you found yourself doing some very C-like tree or list type of coding, you should use a struct instead of a class.

So it would seem. Unfortunately, it turns out this doesn't really work out so well for one very simple reason: structs are Value Types!

Take a look at these two code samples. I'll tell you in advance, the first one fails, the second one passes.

struct Node
{
public int Value;
public string DisplayValue;
}

void Test()
{
List l = new List();
Node n;
n.Value = 5;
n.DisplayValue = "display this";

l.Add( n );

n.DisplayValue = "Display This";

Debug.Assert( n.DisplayValue == l[0].DisplayValue );
}

That one fails.

class Node
{
public int Value;
public string DisplayValue;
}

void Test()
{
List l = new List();
Node n = new Node();
n.Value = 5;
n.DisplayValue = "display this";

l.Add( n );

n.DisplayValue = "Display This";

Debug.Assert( n.DisplayValue == l[0].DisplayValue );
}

That one passes.

The first one fails because "Display This" is not equal to "display this". That is, l[0].DisplayValue returned "display this", not "Display This" as you might have expected.

What went wrong? Its simple. Structs are value types. You passed your struct into the List's Add method as a parameter. Parameters are passed by value. Therefore value types get copied when they are passed as parameters. With reference types the underlying pointer is copied, but the object is not. Thus the first code sample copies the Node and the second code sample doesn't.

update: I had stacks and heaps backwards in the first paragraph... So I flipped them.

What you need is a pointer to the struct. But C# doesn't have pointers (well, not really), so you have to use classes instead. Because of this, I think you'll find that the utility of structs is pretty limited. That probably explains why you never see anybody using them.

Monday, February 18, 2008

The Elements of Design

In the world of software design, there are 3 distinct classes of "pattern" to be aware of:

Smells
Patterns
Principles

The focus of each of these is primarily on enabling code to be changed and understood.

Smells are symptoms that are frequently demonstrated by code which has not been well designed. These are good to be aware of as they indicate you should probably consider doing something to fix the code so it doesn't smell so bad anymore.

Patterns are common OO design techniques for accomplishing certain behaviors. They were originally captured in the "Design Patterns" book by the "Gang of Four." These are very helpful, less for how they can improve your code, and more as a communication technique between developers. Personally I've found Singleton, Strategy/State, and Mediator to be the most far reaching. These are about as close as Computer Science has come to having terminology that can be used to describe abstract code design concepts. In general, you might fix a code smell with a Design Pattern.

Principles are a cross between smells and patterns except instead of telling you what is bad they tell you what is good, and instead of giving you precise designs to accomplish tasks they tell you what, in general, good looks like in Object Oriented languages. Of all of the three classes, Principles are probably the most useful because they describe design in general. Interestingly, they seem to be less known. Probably because they are both newer and slightly harder to understand. Of these, I have found the Single Responsibility Pattern and the Dependency Inversion Principle to be the most far reaching.

Thursday, February 14, 2008

Goz-Inta

I've referred to one of my college professors in past posts. He had a way of presenting lessons in simple packages. One of these was a word he used: Goz-Inta.

He said, "Many new programmers have the Goz-Inta problem, they don't know what should Goz-Inta what!" New programmers write programs that consist of one gigantic function. It doesn't occur to them that some of that code should goz-inta separate functions. They don't have any experience with reasoning about what should goz-inta what. And more than that, it doesn't even seem to be an important issue. After all, working programs is the most important thing, right? HA!

From the time that I emerged out of the "new programmer" phase, I have found that all the major problems of "Software Architecture" can be reduced to questions of Goz-Inta. People seem to progress through the following stages:

1 big function
1 big class
1 big assembly
1 source control branch (as in, no branches)

All of these can be expressed in terms of Goz-Inta.

And that's not all! Take a look at design smells/patterns/principles. What are they all doing? They're either describing symptoms of Goz-Inta mistakes, common Goz-Inta techniques, or big picture Goz-Inta concepts. GOZ-INTA.

Ever heard people say that its important to name your methods and classes well? That's because a good name helps the author remember what should and shouldn't goz-inta those methods and classes. A good name also helps people using the methods and classes understand what's gonez-inta them.

Why do I think that unit testing helps you design better code? Its because unit testing forces you to think about the Goz-Inta problem.

Furthermore, in my experience, the best programmers are the ones who are the best at working through the trade offs created by the Goz-Inta problem.

Thursday, February 7, 2008

"This is a Capacitor"

In college I took a Physics II course. I had been looking forward to that class since early High School. It was basically an introduction to circuits. I was eager to learn about resisters and capacitors and voltage and how circuits were built and designed. I thought I'd gain some knowledge that I'd be able to use in everyday life.

You wont be surprised to hear that I didn't gain any of that kind of knowledge. Sadly, I doubt I retained any knowledge from that course whatsoever. In truth, I barely made it through! Of course, it was one of those classes where a 60% test score can set the curve...

Anyway, there was one lesson in that class that I will never forget. The professor started class and turned to the board and said,
"This is a capacitor."

Then he wrote "C=Q/V" on the black board.

Yep, those 3 letters and 2 symbols are definitely a capacitor...

Now, maybe its just me, but when someone says "This is a capacitor," I sort of expect to glance up and see him holding one, or displaying a picture of one. I don't expect to see an equation. We never talked about what a capacitor DID, or why you might NEED one. In fact, it wasn't until I had a phone conversation with my dad that I learned a capacitor "stored charge". And I'm not even going to get into the fact that C=Q/V describes capacitance, which is a measurable quantity, and not a capacitor which is a physical device...

So what lesson did I learn from this? Different people learn and think in different ways. I'm a practical applied type of learner. That is, I need to understand and not just memorize. Thus giving me an equation or showing me a graph just isn't quite enough. I need to see steps and logic and explanation. In other words, I have to start from something I know and build up.

Other people certainly aren't this way. I've known lots of people who could simply see an equation and understand. I've known other people who could see an equation and then spit back and pass the class, but maybe not understand.

Fortunately for me, my learning style works great for computer science. Unfortunately for me, it's not how Math is taught. And only rarely how Physics is taught.

I've always believed that calculus and physics should be taught hand in hand. After all, the majority of calculus was developed trying to solve problems in physics. Add in a dash of History and you've got a thoroughly engrossing class in my opinion. But anyone who's been through college knows that isn't how its done most places.

Instead physics is taught with algebra, because the kids don't know calculus, even if they've taken it. And that means you have to memorize and spit back more equations for more circumstances. If it was calculus the number of equations would drop and the amount of logic to handle different circumstance would rise. Great for me! But again, not how its typically done.

Given all this, I guess I can't really blame my Physics II teacher. He was just teaching the way he thinks. Turns out a lot of people think and learn that way just fine. Even the Wikipedia article on Capacitance is written in much the same way. Of course, Wikipedia makes a distinction between a Capacitor and Capacitance and does a great job of describing Capacitors in English...

Still, I've never been convinced that people who were able to learn the equation and solve the "word problems" and ace the tests really understood any of it. Some of them did. But most of them? I find it hard to believe.

I have to end this by pointing out that there are certain topics to which my style of thinking and learning just doesn't help. Things like Quantum physics, for example. Where no matter how you look at it, you can't reduce it to common sense. In these circumstances you have to settle for memorizing the "way things are". Chemistry is like this. And yes, lots of Physics is like this too. But I still claim that after you've internalized the "way things are", you can start to reason logically about it.

Thursday, January 31, 2008

Objects vs SoC Objects

Object Oriented Programming rocks the house. The reason why it is good can be summed up with a single word: Abstraction.

I had an awesome professor in college who said that a person is only so smart. They can only keep so many things in their minds at one time and can only fully understand some of those already limited things. The only way, then, that a person can do anything complicated is to abstract out the details. This frees up more space in their limited heads for other things.

This is true. And because it's so enormously true I'm going to provide some examples of abstractions people use all the time.

Language
Money
Time, Dates
Math
Physics

Language is certainly the most powerful example. I say "Dog" and you know what I mean. I could also say say "Golden Retriever." Or I could try not to be so abstract and say "Four legged animal with ears, eyes, tail, nose with extremely sensitive smell, hair all over, and isn't a cat." Of course, now I'm wrong, because there are Dogs that have no hair... So, obviously, language is a very powerful abstraction.

In computer programming, we have programming languages. "Modern" programming languages are typically considered modern because they allow more powerful abstractions. Object Orientation is one such abstraction. Though I'm not sure it really qualifies as "modern" anymore...

It's important that programming languages have good abstraction techniques. Software development is complicated. So for a person to do software development they must be able to abstract things. Its not enough to just abstract things on paper either. Things have to be abstracted in code. What are these things? Algorithms and data. That's about all there is in programming. You have data that has some meaning and you have algorithms that manipulate that data in meaningful ways.

Objects allow us to abstract both data and algorithms. Typically the algorithms are still quite detailed, but we can abstract the details into an object so we don't have to think about them. (Aside: Functional programming allows you to make your algorithms more abstract with fewer details.)

Objects let us abstract things by grouping a set of data and algorithms together into a single named container. And then it goes one step further and lets us create many instances of those containers. So now I can have a Dog object. And I can create many of them. And they can have names and colors and weights and heights and any other data they need. They can also bark and whine and soil the carpet and any other actions (algorithms) they need.

Objects also let us represent commonalities and relationships between things through fun words like inheritance and polymorphism. That just makes the abstraction that much more powerful.

So that's why Object Oriented programming rocks the house: Abstraction. But everyone (who is reading this) knows that. So why did I waste all my time writing it and your time reading it? Because I couldn't stop myself.

What I actually want to talk about is what these objects should actually abstract. As far as I can tell there are two schools of thought. I'm pretty sure the one has already mercilessly beaten the other and kicked it to the curve. But in my schooling I wasn't taught about either, so I'm gonna teach myself about them both here.

The two schools:

Objects should represent "the thing" and everything about it
Objects should do exactly one "thing"

Or if you'd prefer,

Objects represent real world objects
Objects manage "concerns" and concerns are separated between objects

In the first you have one big object that knows everything about dogs and manages everything about dogs. Wanna draw a dog on the computer screen? Dog.Draw(); Want to make puppies? Dog.MateWith( Dog );

In the second you have lots of much smaller objects which together, collectively, know all about dogs. DogDisplayer.Draw( Dog ); DogMating.Mate( Dog, Dog ); Note however that this doesn't preclude you from hiding all the various classes behind the Dog class so that a user of the Dog object could still say "Dog.MateWith( Dog )".

As far as I can tell, current industry thinking is that Separation of Concern Objects are the way to go. However, when object oriented programming is taught it inevitably starts with the Real World Objects because that approach is just easier to understand.

MVC is an example of three Separation of Concern Objects. Just about every Design Pattern depends on Separation of Concern Objects (Observer, Strategy, State, Factory). And the point of just about every Design Principle I've learned is that you should write Separation of Concern Objects, the principles are just defining what Separation of Concern actually means (Single-Responsibility, Open/Closed, Liskov Substitution, Dependency-Inversion, Interface Segregation)!

There are a couple things you have to get over when you start creating these kinds of objects though. First, having lots of objects is not a bad thing. Second, having simple objects that only do one thing is a Good Thing. This often feels like overkill before you do it. After you do it you realize it has the potential to make your life seriously easier.

Okay, but why create many objects? Why not just put all the code in the same object? How is this going to make my life so seriously easier? The biggest reason is that if you put all that code in the same object it will get tangled. That is, two algorithms that really have nothing to do with each other (except that they apply to the same object) will start to affect each other. They'll use the same variables, but in different ways, they'll make assumptions that the others will break. This will become a problem when you need to change one of them and you suddenly and accidentally break the other.

The second reason is that you may want to allow for reuse. If everything is all packaged into one object, you can't reuse only a portion of it without the other parts. The final reason is that you may want to swap out the details of how something behaves for different details.

What would this mean for our Dog? Well, we can still have a Dog. But when we tell him to bark, he'll delegate the actual details of how to bark to a DogBark object. And when you tell him to soil the carpet, same thing.

Now if you decide that he should always bark when he soils the carpet, you can do that. And when you later realize that its bad enough that he's soiling the carpet, but now he's waking you up in the middle of the night barking, you can change it again so he doesn't bark when he soils the carpet. And you wont have to worry that soiling the carpet and barking got all tangled.

Moral of the story: SoC Objects add more abstraction to our abstraction. So when I told you that Unit Testing would improve your code design, this is why.

Wednesday, January 30, 2008

Unit Testing: MVC

So I've talked about what MVC is and looked at MVC in Rich Client Applications.

In those posts I mentioned that MVC was pretty helpful when it comes to unit testing. This is because, as I pointed out in Unit Testing and TDD, you can't test GUIs. Of course, you can if you really set your mind to it... But at this point I'm not convinced its worth the effort.

So how does MVC help us Unit Test? Well, once you have a controller that manipulates a view, you can mock out the view and unit test the controller. So MVC gave us something in our GUI layer to test!

Here's an example. Suppose you're creating an interface in which there are three combo boxes, each one is dependent on the one before it. So, when you select a value in the first, the second one gets populated with certain data. When you select a value in the second, the third one gets populated with certain data. Change the first one, the second one loses its value, gets filled with different data, and the third one loses its value and its data is cleared. Pretty simple little UI and not all that uncommon.

So, the view is simple, it's three combo boxes. Lets ignore data binding to keep this example even simpler. So what does our view's interface look like?

ClearComboBox2Value
ClearComboBox3Value
SetComboBox2DataSource( data )
SetComboBox3DataSource( data )

Now our controller can cause the view to change state by calling the methods on the view. How does the controller know when values have been changed on the view? This could be done in two ways.

Add events to the view that fire when the values of ComboBox1 and ComboBox2 change
Simply put methods on the controller to indicate the values have changed and have the view call the controller

Personally I prefer the second approach. This approach makes it easier to test and easier to read the view's code. Also, it doesn't violate the dependency chain of MVC because the controller still doesn't depend directly on the view. The view depends on the controller, but that's OK. If you need to provide different controllers to the same view you can do this with Dependency Injection. So, if you're willing, lets go with the second approach.

Now we have a view and a controller. Can we unit test the view? You can, but I'm not. Can we unit test the controller? YES!

Why does this matter? Well, the controller now contains the actual behavioral logic of the form. By testing the controller we guarantee that that logic is correct. Now we just have to test that the view updates its state appropriately. You could do this manually by clicking around, or you could use a visual scripting framework to click around for you, or you could write code to click for you... I'm still just doing it manually.

Again, no silver bullet, but we're 80-90% there (depending on how complicated the view is).

The post on MVC in Windows Applications ended without an answer. However I think this post clarifies that the benefits of unit testing a controller likely out weigh the complexity that comes from MVC.

Have you tried it? What do you think? Why not leave a comment? Josh always does! Is he really that much better than you?

Thursday, January 24, 2008

MVC in Windows Applications

In the last post I looked at what, exactly, MVC really is. Now I want to look at how it applies to Windows applications.

Windows Forms with data binding is basically Active MVC! The data binding keeps the view and the model up to date automatically. When the user performs an action (like clicking a button) an event is fired. Some code executes in response to the event, updating the model and causing the view to be updated. Just like in Active MVC. The only difference is we did it all in two classes, the form class (view & controller here) and the data class. We could separate the controller into its own class if we wanted to, and then we'd have the M, the V, and the C.

Are we comfortable with this? The only problem is that in Active MVC the Model takes on more responsibility because its responsible for keeping the view up to date. But if we're data binding to the model, it has to be 1-1 with the controls we're displaying. There is some potential for this to get complicated if we need our Model to do more advantaged state management. This could cut two ways and could be a great thing, or could become really hairy very quickly.

We could convert this to look more similar to Passive MVC by creating two models. One model will be independent of the view and managed completely by the controller. The other model will be the class we use to data bind into the View. This second model will be updated by the controller to cause values of the view to change and be read/monitored by the controller to respond to user events. With this setup we've moved more of the responsibility to the controller but we've also increased the amount of data passing that has to go on.

And there in lies the rub. On the web, the data passing is required, there is no way around it. But in a windows application, we don't have to pass data around like this. We can write code that gets the data directly from the controls and writes them to the database if we want to. There is no need to have a model class for housing data.

So what are we gaining by using MVC in a Windows Application? We've separated the data from the GUI and we've also separated the "controller" logic from the GUI. This will be nice if we ever want to make changes to the GUI like swapping one control for another. Its also nice because we can now write unit tests for the controller and mock out the GUI (and the model if needed).

So the big question is, are the benefits of an MVC design worth the overhead that comes with an MVC design?

To start with, its certainly not good to have a single form class that contains EVERYTHING. In Windows Forms this is how nearly everyone writes, at least when they're starting out. It seems like the simplest option: 1 form, 1 file, 1 class. Everything is in one place!

But as the form gets bigger (more controls, more rules, more behavior) the number of event methods just keeps rising and before too long you're looking at a 2,000 line class (not including designer code...). You can't remember where anything is and you can't find anything either. It feels like being lost in a jungle.

So we need to come up with a way of breaking this into pieces, no question. We just have to decide what pieces we want. MVC provides us with three pieces. They are nice logical pieces in theory. In practice, you really have to commit to "shoveling data around."

If you were hoping I'd end this by telling you what the best practice is, I'm sorry to disappoint, because I don't know. I'm still experimenting and researching. I'd love to hear your take, or your best practice, or even approaches you've tried and despised.

Update 1/25/08: Added chicken scratch images
Update 9/22/08: Follow up post about MVP and MVC which may be more applicable to Windows Applications and talks about View State.

Wednesday, January 23, 2008

MVC is What Exactly?

MVC (Model, View, Controller) is a design pattern or application architecture, depending on how you want to look at it, used to separate the storage of data from the presentation of data from the processing that takes place when the user interacts with the software.

It's also a huge buzzword in software development that everyone has heard about and thinks they understand but that very few people have ever actually researched or read about.

The concept of MVC was first introduced in the Smalltalk language. The Model stores data, the View presents a user interface, and the Controller handles user interaction.

It was recognized that MVC typically took one of two general forms, Passive or Active. Each form differs based on how the controller and the model behave.

Active Form:

Controller handles user iteration events and updates model
State changes in the model cause the view to update (observer pattern keeps model independent of view)

Note: This means the model can be changed by things other than the controller and the view will still update

Passive Form:

Controller handles user interaction events and updates the model and causes the view to update
Model is completely independent of view and controller

Clearly the Model must carry a lot of responsibility in the Active Form by maintaining state or whatever else is needed to keep the UI up to date. In the Passive Form the Model isn't required to do anything more than house the data. The Controller can manage the state changes.

MVC also appears in a slightly different form called the Document-View pattern used by the MFC (Microsoft Foundation Classes) framework which combines the controller and the model into one object, the Document.

MVC as seen in Ruby on Rails is basically the passive model. The controller is invoked by a user interaction. The Controller builds the model (or tells it to build itself) and/or modifies it as needed, the controller then picks what view to render and passes it any data it may need.

ASP.NET's new MVC framework works almost exactly the same way as Ruby on Rails in terms of MVC (surprise surprise!).

Passive MVC makes tons of sense on the web because the Request/Response nature of HTTP requires a very structured data passing model. Ultimately, that's all MVC is, a standard for how to pass data around:

Data is housed in the Model, accessed and modified by the controller, passed to the view to be displayed, passed back to the controller from the view w/ changes made by the user, etc...

So, MVC is a great thing on the web. It simplifies the code, makes it obvious where to go for what, standardizes the flow of data and rendering of web pages, and exactly matches the behavior of HTTP.

That being said, there are still some variations:
Does the Model match the database schema and just house data (as in .NET Typed Datasets or the new LINQ to SQL objects)? Or does the Model present data in an intelligent (problem domain specific) way and contain high level logic for modifying the Model? I'm not sure it actually matters, honestly. Its just a question of where you want to put your logic, all in the controller or spread between the controller and the model?

So, now we've surveyed MVC, talked about some frameworks that use it, and decided that it works really well for the web. In the next post I'll look at if MVC makes sense in a Windows Application.

Update 01/25/08: Added chicken scratch images

Monday, January 21, 2008

Web UIs vs Rich Client UIs

Nearly every application has a User Interface of some sort. It could be a configuration file, a command line, a Windows Form, or a web site. A software application's User Interface is the part of the software that effects the user the most because its the only part that the user has direct interaction with. Unfortunately the UI is also the hardest part of the software to quantify and measure, so its hard to determine if one UI is better than another. And further complicating it, a user might like a UI that they don't work as efficiently with and hate a UI that actually helps them get their work done faster. Its very subjective.

Before I go any further I ought to point out that having a good UI doesn't mean you have good software. The primary function of software is to fullfill a user's goals.

In any event User Interfaces are clearly important. So where do you find the best user interfaces? I think most people these days would say that Rich Client UIs are the best. They're more consistent, easier to develop, have shortcut keys and menu bars and toolbars, and have fast response times. Meanwhile, web apps are all different and inconsistent, are very difficult to develop, arn't as easy to interact with, and have high latency.

At least, that's what everyone seems to be thinking. Even in articles where very smart people like Joel Spolsky and Jeff Atwood are discussing how web apps are going to replace desktop apps you can still see those underlying assumptions.

Of course, AJAX is changing things. Now we have Gmail and Google Calendar and 30 boxes and Jott and Remember the Milk and Fog Bugz and Scrybe and Google Maps and Yahoo Maps...

Maybe its just me, but I like the UIs of these web apps better than what Outlook has to offer, or iCal, or Thunderbird (and Sunbird) or TFS or Microsoft's Streets and Trips. Why? It comes down to three reasons:

They are more dynamic
They are easier to read and understand
They are prettier

By more dynamic I mean, when you hover over something, information is displayed in a nearby panel. Or when you click a row, it expands to show you more detailed information and give you links to perform actions. Also, the display of data is usually customized so that, for example, if you're seeing a list of todos and one is longer than another they both take up whatever space they need making them easier to read. In a rich client they're typically forced into the same amount of space in some kind of Grid control.

They're easier to read and understand because everything isn't displayed in static grids and panels. Instead, it's displayed more how you'd display the same information in a report. That is, it's laid out like a Flow Document. This makes it easy for your eye to find the information you're looking for and remember where to find that information in the future. Also, things tend to be broken into display and editing in two different interfaces. This way you're not staring at a grid of labels and text boxes, some with data in them and some without.

And finally, they're prettier. They have color and logos and icons. Most rich client apps are either a sea of grey, or washed out white, with black text over top.

So from a strictly functional standpoint, these Web UIs are better than any Rich Client UIs I've used to accomplish the same tasks. Now when you add in cross browser compatibility problems and the steep development cost and the potential for high latency things become less attractive. And this inevitable leads me to ask, "Why don't my Rich Client Applications behave as well as these Web Applications?"

The answer is its actually harder to do this kind of stuff with the existing frameworks in Rich Clients. For example, there is nothing in Windows Forms which will let you create an html style layout. You can simulate it by creating a custom control for every item and state you may need to display and throw those into a Flow Layout Panel, but the overhead involved in developing this and the performance costs when it's completed makes it too prohibitive. Ultimately you'd have to just perform your own custom GDI drawing or use an HTML rendering engine and write in HTML. Neither of these is really a very good solution.

WPF may offer a solution to this. I haven't actually done any development in it yet, though I've done plenty of reading. For now, I'm hopeful and looking forward to finding some free time to experiment with it.

Ideally I'd love to see Rich Client applications which took a more flow document layout style approach like you see on the web as well as more dynamic interfaces capable of showing the most pertinent information in the "report" view and allowing the details to be displayed through a web style progressive disclosure model.

And I'd love to see tab controls replaced with web style Navigation bars that toggle the "report page" or navigate to "anchors" in the page. These are simple but powerful UI patterns that you rarely see in a Rich Client environment.

How do you edit data? Either allow users to click and open a pop up to perform edit tasks (like Properties Windows) or dynamically display an edit control when the user clicks on the read only data. These two patterns both allow you to perform very simple spot editing or display more complicated edit UI when needed.

The fun part of all this is that none of it is new. You use it every day when you browse the web. Its time designers started incorporating the good ideas from the web in Rich Clients. And its time Rich Clients were developed with a Framework that makes dynamic UIs possible!

Wednesday, January 16, 2008

Unit Testing: Gateways

In an earlier post I gave an overview of Unit Testing and TDD. Now I'm going to start drilling down into some actual examples of how to structure your code to get the most out of Unit Testing.

In the last post I made the case that Unit Testing improved your object oriented design. I stand by that. But there is a stumbling block. You can still write badly designed code and add badly designed unit tests.

So how do you know what's bad? Just answer these two questions:

Are your tests testing exactly one thing?
"Does the object you're testing serve exactly one purpose?

If the answer to both questions is yes, you're doing great. If not, you may need to refactor.

Of course, those questions are easy to ask, but much harder to answer.

My first example will have to do with code that retrieves data from a database or some other data store. When first starting out people tend to write a classes that include data retrieval along side various "business logic" routines. For example, you may create a class that takes a data table as input, validates it for certain restrictions, and inserts it into the database.

This class does two things, it validates and it inserts. If you unit tested it your tests would be testing two things, that the validation was performed correctly and that the insert into the database was performed correctly. So, how do we refactor this?

Use the "Gateway" pattern. A Gateway is a class which serves as the path to and from your data store. The, err, gateway to your data. I always create one Gateway per "functional unit." So in this case because I'll have a Validator class and a ValidatorGateway. So far our example only requires the ValidatorGateway to have an insert method.

Once we perform this refactoring we'll unit test our Validator and we'll use nmock to mock out our dependency on the database by providing a mock Gateway (through dependency injection). Now when we Unit Test our Validator class we wont be testing that it inserted into the database correctly, only that it intended to insert into the database correctly.

However, isn't that still two different things? Validation and the intention to insert? It is. So we should probably take the Gateway out of the Validator all together. Instead, our Validator will just validate the data and tell us if it passed or failed. Then some other code will utilize the Validator and the Gateway.

Clearly those two simple questions can be very powerful in guiding your design if you take the time to ask them. Furthermore, the Gateway pattern is a very simple and very flexible concept that can easily be applied to nearly any situation. As a freebie, the gateway pattern also gives you the ability to swap out what kind of data store you're using very easily. Need to switch from talking directly to the database to talking to a web service? Just update the gateway and you're done.

Thursday, January 10, 2008

VS2005: Referencing an exe

If you create a VS2005 solution which contains an exe project and any other project type which has a project reference to the exe project, chaos will ensue.

You may see errors like the following:
Could not find file 'someResource.ico' referenced by assembly 'C:\path\to\your\assembly.exe.manifest'.

You might even see the error:
Could not find file 'Microsoft.Windows.CommonLanguageRuntime"

It turns out this is a bug in Visual Studio 2005:
KB907757
msdn forum thread
other msdn forum thread

We ran into this problem when creating a project to contain Unit Tests that references the actual application project. The work around we're using is to turn off the "Enable ClickOnce Security Settings" check box in the Project -> Properties -> Security menu. Unfortunately, when you publish an application through ClickOnce this option is automatically checked. That means you'll have to turn it back off after you finish your publish so your Unit Test solution doesn't fail to build.

The other work around is to replace the project references with references to compiled dlls. However, that clearly isn't very helpful if you're trying to do TDD.

From what I can learn, it seems that this bug is fixed in VS2008 but I haven't had an opportunity to test that for myself yet.

Wednesday, January 9, 2008

Leadership

In High School and College teachers and professors (of the liberal arts...) pay a lot of lip service to "Leadership." Of course they never really tell you what it is, they just tell you to do it.

Usually this was most apparent in group projects. The group's first order of business was always to select a leader. This was done by someone asking, "Who wants to be group leader?" Everyone would give each other blank stares until someone finally responded with, "I'll do it, I guess."

Sometimes the group leader would simply end up being the group secretary, responsible for writing down the answers and turning in the final paper. Other times the poor bastard would actually try to "lead." Depending on who the leader was, this would take a few different forms. Some leaders would try to make all the decisions and tell people what work they were going to do, even before anyone had started brainstorming about the project. Other leaders would wait for someone else to come up with a good idea and would then say, "Okay, here's what we're going to do!" and repeat the idea.

So basically school was worthless when it came to understanding or teaching leadership, and group projects were only good as an exercise in frustration (which is clearly a good exercise for the real world).

Ultimately, I think leadership is simply organizing the behavior of a group of people. So, basically, it's telling people what to do. Good leaders tell people to do the right things and don't piss people off when they tell them. Good leaders also resolve disputes by helping others make good decisions. Good leaders are also able to take the best advantage of the skills of the members in the team.

Over the years in observing how people try to lead, and how other people seem to lead without trying, I've decided leadership has more to do with personality than anything else. These are the three ways I've identified that seem pretty common:

Lead by decree
Lead by consensus
Lead by respect

The people who lead by decree come into a meeting with their minds made up and tell everyone else what the team is going to do. Sometimes these people try to pretend they want other people's input, but instead of actually listening to what people are saying they just wait for them to finish talking so they can go back to convincing everyone to do what they want.

The lead by concensus type is more democratic. They bring their ideas to the table along with everyone else's and allow everyone to argue it out. Which ever idea the team decides on wins. The lead by consensus type of person typically doesn't throw their weight around to get what they want, even if they have any weight to throw. That doesn't mean they give up on their idea completely, but it means they usually defer to the group before trying to issue a decree. This can be a problem when a decision really needs to be made or when a firmer hand is warranted, because you don't want to waste forever arguing around in circles. Typically I think I fall into this category.

The lead by respect person is a rare and impressive find. These are people who everyone simple likes and respects so much that other people actually want them to lead. I've only known a few of these kinds of people and I've never been fully able to understand how they did it. Sometimes these people never even come across as telling people what to do.

These are the three "categories" I thought up, and as you can tell they focus more on how a single person actually behaves. I did some brief google research and found Lewin's Leadership Styles. Interestingly, these correspond somewhat well with my own take, but they capture more of the gray area:

Authoritarian Leadership - leading by decree
Participative Leadership - leading by consensus but retaining the final say
Delegative Leadership - leading completely by consensus and/or allowing everyone to make their own decisions

Some interesting thoughts. No one leadership style can be considered the best. It depends on the circumstances. For example, if the leader is the most skilled of the group, Authoritarian leadership makes the most sense. However, if the peers are primarily equal, Participative makes the most sense.

The leadership style also dramatically affects the group members. If the group members are very independent people, they may prefer a Delegative leadership style. But, while they may prefer it, it may not be the most effective.

Finally I've been in very few situations where there really was one leader. Leadership tends to float around, landing on different people at different types. The title doesn't float around, but the actual act of leading does. I think its this fact that makes effective leadership important. You have to recognize that leading doesn't mean controlling every detail. And you have to work well with other people when the leadership bubble has landed on them. Otherwise you're just going to piss everyone off.

Monday, January 7, 2008

Versions Followup

In the previous post, Two Versions of the Same Shared Assembly I outlined a problem as well as many solutions, suggesting the solution I thought was the best choice. Since then I have run into one very serious problem, and one slight concern.

Lets start with the serious problem. I implemented the solution of putting the version number in the assembly name. This worked great at run time, but the Visual Studio designer started bombing out. The weird part is it worked great in one project, but not at all in another. What's the difference between the two projects? I have no idea. They are both referencing the same project which has the SharedAsm built as a dll w/ a version number in the name as follows:

App1 -> Comp1 -> Comp1Utils -> SharedAsm.1.0.0.0
App1 -> SharedAsm.1.0.0.1
---
App2 -> Comp1 -> Comp1Utils -> SharedAsm.1.0.0.0
App2 -> SharedAsm.1.0.0.1

App1 doesn't work, App2 does work. I thought it might have something to do with the alphabetical ordering of the project names affecting what assembly VS used, so we renamed App1 to be the same as App2 relative to SharedAsm. That didn't change anything, it still worked.

I can't explain what is going on here yet, but it seems like the VS Designer is arbitrarily picking one of the two versions of SharedAsm in the AppDomain when it creates an instance of a type from that assembly. In the one project, it picks the wrong version, in the other project it picks the right version. Ultimately this results in the designer bombing out because of some code which tries to cast an object to a known type, but in this case the versions on the type don't match... boom.

Here's an example:

Form1.cs:
private SharedAsmType var = new SharedAsmType();
...
var.TypeOfInterest = typeof( Form1 );

Inside SharedAsmType the TypeOfInterest is constructed using the Activator class. This is then casted to a known type in the SharedAsm assembly. It is this cast that fails.

The reason for this is that SharedAsmType is coming from SharedAsm.1.0.0.0 but the TypeOfInterest inherits from a type in SharedAsm.1.0.0.1. Thus the activator returns a SharedAsm type different from the type that we're executing in at that point, and the cast fails:

SharedAsmType2 t2 = (SharedAsmType2)Activator.CreateInstance(t);

What should happen is SharedAsmType should come from SharedAsm.1.0.0.1. It doesn't make any sense to me at all why it's coming from 1.0.0.0... It really looks like the VS designer is just picking a SharedAsm assembly, ignoring which one is actually referenced by App1.

If any one can shed any light on this one, I'd absolutely love to hear about it.

The other problem is one I haven't directly tested but which I imagine would be an issue. If your existing code base had any code in it which used reflection to reference the SharedAsm assembly by name, you wouldn't know which version you were going to end up with. I don't know if .NET would just pick a version, or if it would bomb out... This one is just a problem of having two versions of a shared assembly loaded in the AppDomain at the same time and using reflection. The only one of our solutions which wouldn't suffer from this would be the ILMerge /iternalize solution. And again, the memory usage side effects of that just aren't acceptable.

So, unfortunately, this seems to bring me back to square one again... I'll post any information that may come up as I continue to fight with this problem and I'd really love to get some insight from the vast world of .NET developers out there. Surely there are more .NET programming shops running into the need to create "packages" that contain shared assemblies!

Thursday, January 3, 2008

Two Versions of the Same Shared Assembly

This post is a follow up to an earlier cry for help: Shared Assemblies, Components, and Applications

I actually have a few workable solutions to the problem I presented there now. But you don't have to read that old post, I'll present the problem again here in a lot more detail:

Suppose you have some .NET assemblies which depend on each other as follows:

App1 is an executable application
Comp1 is a dll library, used by App1
Comp2 is a different dll library, used by App1
SharedAsm is a dll library used by App1, Comp1, and Comp2

Notice that the diagram indicates that there are three different versions of SharedAsm being referenced here. The problem I have been trying to solve is, how would you set up such a system with .NET?

Lets step back first and look at why you might run into this. It's quite simple. Component1 and Component2 are pretty beefy modules. So beefy in fact that they each have their own development teams. The components aren't just used by App1 either. There are dozens of applications that use these components. This is a perfectly common occurrence. The only reason it is a problem is because of the dependency on SharedAsm.

Here's why: Comp1 is working with version 1 of SharedAsm. They want to release their component to the App1 development team. But App1 is using version 3 of SharedAsm. Version 3 is not compatible with version 1. If we're working in Visual Studio here this will be a serious problem.

Suppose App1 is setup with a solution that has all project references. If this is the case, Comp1 wont compile because it was expecting version1 of SharedAsm but its going to find the current source control version instead.

Okay, suppose App1 is setup with a solution that references dlls instead. This still wont work. When the dlls are copied into the output directory one version will overwrite the other version... Boom.

It seems that the only option is for the Comp1 team to update their code to use the same version of SharedAsm as App1 is using. But what if App2 is using a different version? Well, lets just make it company policy that all active projects must use the latest versions of all assemblies. But what about App3 which is no longer in active development and doesn't have a development team anymore?

Let's sum up our troubles:

Comp1 can only be used in any application as a project reference if that application will reference ALL assemblies as projects and be willing to immediately update to new versions when they are checked in (sloooooooow build times w/ all those projects, very fragile environment as the slightest bad checkin breaks everyone, hard to deal with legacy projects)
Comp1 can't be "released" as dlls because of the potential for shared assembly version conflicts
We can't work on non active projects without updating them to the newest versions of all components or performing fragile and complicated branch magic in our source control system.

Are there any completely automatic solutions to this problem supported by Visual Studio? Nope.

Are there any solutions for this at all? Lucky, yes.

Add the version number to the end of the dll names by changing the Assembly Name property of the visual studio project before building it.
Use ILMerge to combine the SharedAsm.dll and Comp1.dll into a single Comp1Merged.dll using the /internalize option.
Install SharedAsm to the GAC and make sure all references to it in the Visual Studio projects are set to SpecificVersion = true and CopyLocal = false.

I think the first solution, appending the version number to the name, is the best all around solution.

+ It can be done from within Visual Studio requiring no outside tools or custom build scripts
+ It generates pdbs with no effort allowing people using the assembly to step through the code at run time
+ Its memory efficient, different versions of SharedAsm will only be loaded if different versions are actually being used
+ The SharedAsm team only has to add the version number. Nothing unusual has to be done by any other teams.
- It requires someone to manually update both the version number and the assembly name
- It requires that SharedAsm go through "structured releases" in which the version number is incremented and the dlls are made available to anyone who wants or needs to upgrade
- To upgrade, teams using SharedAsm have to both remove the old dlls from their project and update the references (this can be automated with a script if necessary) (and fix errors)

The second solution, ILMerge, works great, but it has a few more serious downsides. For more details on this solution refer to this post on Jon's ReadCommit blog.

- Requires an outside build script to run ILMerge (or a post build event in VS? or a project file modification?)
+ PDB files are merged as well
-- Potentially very memory inefficient as n versions of SharedAsm will be loaded regardless of if they are actually different versions. This is a deal breaker for the way I need to use this pattern as I have will have way more than 2 components using the same shared assembly.
+/- The process is managed by the Comp1/2 development teams, the SharedAsm team doesn't have to do anything special.
- It requires that SharedAsm go through "structured releases" in which the new dlls are made available
+ To upgrade teams using SharedAsm just copy the new version over the old version (and fix errors)

The third solution, GAC, also works great but requires the most work and maintenance.

- Dlls must be placed in the GAC on every developer's computer
- Comp1 team and Comp2 team must release script to add needed dlls to GAC as well as new version of their comp1/2.dll
- No PDBs for dlls in the GAC, thus can't step through code in SharedAsm when used by components
- The process requires effort from everyone
+ As memory efficient as the version number solution
- Dlls must be placed in the GAC, regardless of if they really make sense there

I have tested all of these methods and they do all work. I haven't actually started using them in our development environment so I'm only speculating about the pros and cons that will arise from that.

For the record here are some non-solutions:

.netmodules generated with csc.exe
use of the extern keyword and aliasing references (requires too much foresight and work to be practical)
renaming dlls after they've been built (you must change the Assembly name before you build or .NET will not be able to find the dll to load it)
ILMerge w/o the internalize option. This will work if App1 doesn't directly reference SharedAsm. But as soon as you add the SharedAsm reference you will receive a compiler error: The type "sharedasm.type" exists in both "comp1.dll" and "comp2.dll." This is because all three types are visible and .NET doesn't know which one you wanted to use. The internalize switch solves this problem by making the types internal in comp1 and comp2.

Big thanks to Jon and his ReadCommit blog which taught me about the /internalize switch of ILMerge.

FOLLOWUP: Versions Followup

Monday, December 31, 2007

Polishing the Golden Shovel

My senior year of High School I took AP English. I hated that class, but I have to admit I did learn a little about writing. One thing I learned was you got a higher grade if you included your own insight and opinions into your "book review" type papers.

My junior year of College I took History of Aviation to fulfill the last of my History credit requirements. It was a horrible class. The material was interesting, and the books we used were decent, but our professor was… well, boring. And the tests and assignments were simply a waste of time. It was quite literally like going back to freshman year of High School with "chapter summary" 1 page papers in 5 paragraphs due every week.

I can’t recall exactly what grades I got on the first few papers, B- or C comes to mind though. The professor had underlined roughly half of my essay (not to indicate anything, I think she was just one of those people who underlines as they read so they don’t loose their place...). Then she’d scrawled, "more examples" or something to that effect on the top. I was kind of peeved. I’d read the material, and I thought that was obvious given the content of my "essay." And like I’d learned in AP English in High School, I’d added in a lot of my own personal take on what the reading had taught me and how it applied to whatever idiotic prompt we’d been given that week. That was clearly not what the professor was looking for.

It took me a few papers to finally realize that this was a class where it didn’t matter what I was learning, or how well I was writing. This was a class where I had to figure out what the professor wanted to see in those papers. It didn’t matter if what she wanted to see made sense, or was good or bad, or furthered my education... It only mattered that she wanted to see it.

So I developed a method for writing those papers. I would look at the prompt we’d been given. Then I’d skim through the material we were supposed to read. When I found a sentence or paragraph that applied to the prompt somehow, I’d underline it and add a mark at the top of the page so I could find it again. When I’d finished skimming the material, I’d go back through and find everything I’d underlined. I’d summarize the sentence and put the page number. Then I would number each quotation 1-n in an order that would sensibly answer the prompt. Finally I’d write my paper. The introduction would restate the prompt. The rest of the paper would be direct quotes (or sometimes paraphrased quotes, always referenced) with transitions between them. The conclusion would restate the prompt. Each of my little 1 page essays would include about 8 to 12 quotes.

Let me drive that home. I literally wrote NOTHING except to transition between quotations. I’m not overstating this, that’s all I did. And while I think I did a decent job masking it with my transitions, it was still quite obvious.

I got As on every paper for the rest of the semester.

My dad once taught me a term which summarizes what I was doing here wonderfully. When I would tell him about some stupid paper I had to write he would say, "Polish your golden shovel!" This referring to all the Bull Shit you’re about to shovel into your paper and attempt to masquerade as golden writing.

Of course, usually when you’re putting your golden shovel to work it’s not as blatant as my History of Aviation story. In fact, in normal use it’s a very valuable skill to have. Sometimes you have to sharpen the saw, and other times you have to polish the golden shovel.

Friday, December 28, 2007

SQL Server 2005 Service Broker

Service Broker is a very nice feature of SQL Server 2005. Like most things, it’s quite simple after you’ve figured it all out, but when you’re first learning, it can be quite daunting. When it comes to learning about SQL Server the main problem is that all the words Microsoft uses to describe things are so overloaded it’s easy to get confused. Everything is an "application" and everything is a "service"... This rapidly becomes confusing... "Which application are they talking about now? Their’s? Mine? What service? My service? The service broker service? That stored procedure?"

It also becomes confusing because SQL Server has so many different "technologies" built into it. A lot of them seem to do the same thing, but in fact they’re different, some more than others. For example, there is Notification Services, Service Broker, Event Notification, Internal Activation, External Activation, etc. And then it gets even more confusing because some of these "technologies" are just combinations of other "technologies." External Activation, for example, is really just Event Notification applied to Service Broker and it uses Service Broker Queues to deliver its messages.

In this post I’m going to briefly go over some of the things I’ve learned about Service Broker. If you’re completely new to Service Broker and you want a quick detailed overview with code samples you may want to go read this tutorial or browse through MSDN’s service broker articles.

So then, what is Service Broker? It is simply a queued message delivery system. A message sent from one person to another is guaranteed to not only arrive but arrive in order, and in the event that something goes wrong, not be lost. Of course there are a lot of details to describe exactly how it behaves, but that’s really all there is to understanding it.

What would you use it for? Lots of things! The uses I’m most concerned with are "Asynchronous triggers" and "Large-scale batch processing." In other words you can drop a message into a queue and then forget about it, continuing with your own work and ultimately completing. In the background, you know that your message will at some point be received and handled. In this way you can offload work from stored procedures or triggers to be performed asynchronously.

So here is the big question, if we’re sending messages from person A to person B, who or what are these people? They are anything which is capable of executing SQL commands. Thus it can be a stored procedure, a C# application using dynamic SQL, a Windows Service, anything. Notice that any of these things can be either the sender or the receiver!

Here’s a simple example. The sender can be a stored procedure which you invoke. This stored procedure will simply execute the needed commands to send a message from himself to the receiver. The receiver can also be a stored procedure, but a slightly unusual one. This stored procedure will be configured to start when SQL Server starts and it will run infinitely in a loop, periodically checking for new messages. If this is beginning to sound a lot like Socket programming to you, then you already understand Service Broker.

Where is the queue in this example? Well, actually, there are two of them. The sender has a queue and the receiver has another. Why? So far I’ve been describing this as a dialog. Person A always sends messages to B who always receives them. B never sends anything to A. But, in fact, that’s not the way Service Broker works. Service Broker always assumes a dialog in which A and B are both sending and receiving messages to and from each other. That doesn’t mean you have to carry on a dialog, in fact when I’ve used Service Broker I’ve only had a monolog. But that’s simply in the way I’ve written the applications that are using Service Broker. I didn’t do anything different in the way I setup Service Broker to make it work that way.

Back to our example, since we have two stored procedures, could we put them in two different databases? Indeed we can! In fact, that is one of the main uses of Service Broker. You could put them in two different catalogs of the same server, or you could put them in two completely different servers. Now you can effectively have two different databases talk to each other.

Having a stored procedure that is always running and monitoring the queue in a loop works just fine and there is really nothing wrong with that approach. However, Service Broker comes with another option for receiving from a queue called "Internal Activation." Basically, SQL does the looping in the background for you and when a message arrives on the queue it invokes a stored procedure. This stored procedure pulls the message off the queue and does whatever it wants to do with it. Usually the stored procedure is written to then enter a loop where it pulls messages off the queue until the queue is empty, at which point the stored procedure ends. Internal Activation will also invoke multiple instances of your stored procedure if the currently running procedures aren’t able to keep up with the number of messages arriving at the queue. In this way Internal Activation can help your message handling scale based on demand.

When I first heard about this a little red flag went up in my brain. I thought to myself, "Self! I thought messages were guaranteed to be delivered in order with Service Broker. If Service Broker is going to launch multiple "handler" stored procedures in parallel, how is it that the messages will be handled in order?" It turns out the answer lies in another Service Broker concept I haven’t mentioned yet called a conversation. Messages have to be sent on a conversation. Messages within the same conversation will arrive AND be handled in the order they were sent, guaranteed. But messages between conversations can be handled in parallel. This behavior is performed magically in the background without you having to do anything. It happens because whoever receives a message on a new conversation automatically obtains a lock on that conversation and they will only be able to receive messages from that conversation, until the conversation ends. But again, if no one had told you, you’d scarcely be aware of it at all.

If there is Internal Activation, there must be External Activation as well, right? Right. Internal Activation launches a stored procedure inside SQL Server, External Activation is for launching arbitrary programs outside SQL Server. .NET programs, C++ programs, Ruby programs, whatever. At least, that’s what you’d expect. In fact it’s kind of a misnomer. The Internal Activator launches the internal stored procedures. Internal Activation refers more to when the activator should do its work. The same is true for External Activation. The difference is that SQL Server 2005 doesn’t come with an External Activator. So you have to write your own. You can do this, because SQL Server 2005 does have External Activation, which tells you when a Queue has been activated.

External Activation is performed with SQL Server Event Notification. In the parlance, you’d write an external application which receives the external activation event (called the Queue Activation event). Okay... So how does the external app "receive an event?" Through Service Broker! That’s right, you create another Service Broker Queue and the Queue Activation event is placed in that queue as a message. Then you write an external application that monitors this new queue and creates new application processes which behave just like the activated stored procedures from Internal Activation. I’m not going to go into more details of how this works. It is a bit more complicated of course. However, Microsoft does have a sample.

How does this external application monitor the Service Broker queue? Exactly the same way as our first example stored procedure did. It just sits there in an infinite loop waiting for messages to arrive on the Queue. In other words, it just executes a SQL statement inside a loop.

So, what’s the difference between using an External Activator and just writing an application that monitors the actual queue in question (not the "queue activation" queue)? The only difference is that the External Activator will spawn off multiple handlers applications and therefore scale better with demand. Just writing a monitoring application will only handle one message at a time (unless you get fancy, which could get you into trouble in some circumstances due to conversation queuing as mentioned earlier). However, there are circumstances where this is all you need. In fact, all the ways I’ve used Service Broker to date have worked this way.

This is partly because all my Service Broker "applications" have been monologs and the processing they have performed wouldn’t really go any faster if it were done in parallel. In fact, I have never used more than one conversation. So far, I always create a single conversation. I cache the conversation handle in a database table so I can reuse it whenever I send a message. This is because most of the performance impact when using Service Broker comes from creating a conversation. That makes sense since the conversation is actually the unit that has guaranteed in order message delivery. By caching the conversation and never ending it or starting a new one, you improve the performance. This is discussed in an article called Reusing Conversations. You have to consider some scalability concerns before deciding to use exactly one conversation. For me however, I felt that was good enough (not to mention simple).

For me, the receiving application is a Windows Service written in C#. It creates a background thread that loops indefinitely issuing a WAITFOR( RECEIVE ... ) command to SQL Server with a half second or so timeout. This setup allows me to send messages serially and receive and handle them serially as well. If I needed to send messages in parallel I’d have to use more than one conversation. I believe I could still receive the messages serially, but I’d have to be very careful with when the conversation lock was released, as discussed in this msdn article, Conversation Group Locks.

Hopefully this summary has shown how flexible and powerful Service Broker is. And also how easy it is, despite the fact that the learning curve is pretty high and the setup curve is pretty high too (you have to create at least 6 different new database objects at the bare minimum to get a service broker conversation going). With luck SQL Server 2008 wont deprecate all this information.

Thursday, December 27, 2007

Unit Testing and TDD

Unit Testing and Test Driven Development have been around for quite awhile. However, they're still just beginning to really become a commonly seen practice in industry programming, partly because of the influence of Agile Programming and Dynamic Languages.

A unit test is very simple (and it has nothing to do with any manor of human body part). From a code perspective a unit is a public class with public properties and methods. From a design perspective a unit is a "high level" function or behavior or task or object. A unit test is a method that tests a certain behavior of the unit. Thus one unit will have many unit tests, each one testing a different part of its behavior.

Sometimes unit tests are written by "QA" people after the developers have finished writing the code. Other times, unit tests are written by the developers themselves after the code has been written. Both of these approaches are terribly boring, inefficient, and ineffective.

Test Driven Development is the practice of writing unit tests BEFORE production code. It goes like this:

Write a list of things to test
Pick a test and write it before you've written the production code
Write the minimum amount of code needed to pass the test
Refactor the production code AND test code, if needed
Repeat

This is typically referred to as the Red/Green/Refactor pattern. It’s very festive and Christmas appropriate. First, you write up a list of tests. Then you decide what test you want to code first. Being a lazy computer geek who would rather IM the guy in the other room than actually get up out of your chair and (*gasp*) walk, you pick the easiest test first. Since you haven't written the code that the test is supposed to test, your test probably won’t compile. So, you write the minimum amount of code needed to get it to compile by defining the class you invented out of thin air in the test and any properties or methods you may have whimsically pretended it had. Now you run the test and it fails since none of those methods or properties actually do anything yet (red). Now that you see red, you can actually write some code. You write the minimum amount of code needed to get the test to pass (green). Then you survey your code and your tests and refactor as needed, re-running the tests when you're done. At this point you enhance your list of things to test, in case you thought of something new, and pick another test.

Every line of code you write is directly driven by a unit test (or a refactoring): Test Driven Development.

If you think I’m over emphasizing the minimum concept above, wait until you read your first TDD code sample. The first one I saw tested a Queue’s IsEmpty method. The test was that when the Queue is first created, IsEmpty returns true.The code they wrote to pass this test was:

public bool IsEmpty()
{
return true;
}

So when TDD people say do the minimum, they really mean it.

It sounds quite simple and for the most part it is. However, there are some guidelines which will help make the unit tests more useful.

Tests must be fully automated requiring no external input or setup that is not performed by the tests themselves.
Don't write code unless you have a failing test. Your tests define the code's requirements.
Each test should test as little as possible. That is, be as specific as possible and test exactly one feature of the unit. Then, when a test fails, you know exactly what is wrong.
Tests should exercise only the unit being tested and not any of that unit's dependencies (ex: database, other units, etc). This is accomplished through dependency injection and mock objects.
Test code is not a secondary citizen; it is just as important as production code and should be treated as such.

The main question to ask about Unit Testing and TDD is: does it help? Well, help with what? Bugs? Time to develop? Maintenance? Code design?

Here are some of the pros I’ve seen while doing TDD:

Fewer bugs: Thinking through what tests to write, and actually writing them, frequently reveals other test scenarios I would have missed. This is partly because I’m not doing "gunslinger" development and partly because you are using your classes before you’ve written them.
Regression testing: Worried that code you wrote three months ago still works? Run the tests. Worried you forgot something during your code refactoring? Run the tests. Worried you broke your public interface by accident? Run the tests.
Easier debugging: The test tells you exactly what is wrong and you can frequently tell exactly what LINE of code is wrong.
Improved code design: Because you need to test one unit at a time, you naturally design your code into smaller, “DRY”-er, more orthogonal, and more composable units. You also tend to design around algorithms instead of UIs.
Increased confidence: Because everything is tested and tests are fast and easy to run, you KNOW your code works. When your boss asks, "Does this work?" you don’t have to say, "Last time I checked..." The last time you checked was immediately after your last build. "It works."
One development style: Whether you’re writing new code, or prototyping, or debugging, or adding features, or working on someone else’s code, it’s the same process: Red/Green/Refactor.
Feeling of "making progress": You’re constantly finishing small units of work in the form of tests and getting instant feedback that the work is correct. This means you don’t have to go days hacking on some deeply embedded code, hoping you’re getting it right.

What are the cons?

At least twice as much code: Tests require lots of code. You have to write it, there is no avoiding it.
Tests take time: Writing all those tests takes time.
Test Maintenance: You have to keep that test code up to date if you want it to be worth anything.
Dependency Injection can get ugly and it can be hard to know where to draw the line, as I discussed earlier.
Some things can’t be tested: You can’t test GUIs. There are also things that are just too hard to test. Asynchronous timing issues for example.
Code has to written to be tested: You can’t test any arbitrary code. You can test parts of it in some cases, if you’re willing to deal with dependencies it may have. But to test it "properly" it has to be written with testing in mind.

So is it worth it? Do the pros out weigh the cons? So far, I think they do. I even think that in the long run, writing unit tests may actually save time because they make it so much easier to update code and verify that code is working correctly months and years later. They also make it easier for multiple developers to work on the same code at different times. Also, if you’re doing proper TDD, then maintaining the tests isn’t an issue. And you can address old code that isn’t tested by refactoring it and testing the refactored portions little by little as you fix bugs and add features. And while you may not be able to click a button on a form, you can call the method that the button’s click event handler would have called.

All that being said this is still computer programming.There is no silver bullet. But TDD seems to get between 70% and 80% of the way there, and that’s certainly better than randomly clicking around your Forms...

Finally, if you're completely new to this stuff you're going to want to check out nunit for unit testing and nmock or Rhino.Mocks for creating mock objects, all of which are free.

And now we’ve reached the part of the post where I ask what you think about Unit Testing and TDD. Do you agree with my take on it? Do you have stuff to add?

Thursday, December 20, 2007

Dependency Injection

Dependency Injection is an "object oriented design pattern" for creating loosely coupled objects. It has nothing to do with needles, and if it makes you think of the Breakfast Club that's not my fault.

Lets say you're writing a class whose job is to find a certain configuration file. Its going to look in a predetermined expected location, and if it doesn't find it there it will ask the user to select it. This object is dependent on the user. And in code, it is dependent on some kind of UI for asking the user for the file.

If we're in C# you can write this object so it creates an OpenFileDialog (which doesn't have the same Memory Leak causing behavior as a regular form, by the way) and displays it to the user. However, this would not be a loosely coupled design. What if you wanted to change this to use the Vista open file dialog? Or what if you wanted to make it force the user to select a valid file by continually asking them until they select one? You'd have to edit the main class. And you'd have to be careful that the user prompting logic didn't become intertwined with the rest of the class' logic.

What can we do? Let's inject our class. Let's inject it with dependencies. Or more accurately, classes that take care of its dependencies for it.

Create an interface like IPromptUserForConfigFile which contains a method that returns the path the user selected as a string. Now modify the constructor of your main class to accept an instance of an IPromptUserForConfigFile class. The main class simply calls the method on this interface, all the details of how the user is prompted are abstracted away. Plus you can change how the user is prompted at any time by simply passing in a different class that implements IPromptUserForConfigFile.

Dependency injection seems pretty simple. It gives you loosely coupled objects, which are very orthogonal, and possibly best of all it makes it easy to unit test with the help of a library like nmock.

What are the downsides? You now have an interface and a class that you otherwise probably wouldn't have created (since you haven't actually had any practical DEMAND for loose coupling yet). And you also have to construct a class that implements the interface and pass it into the main object's constructor. There are libraries to help with that part, like Spring.NET. I've never personally used them, but they exist. Actually, when I've used this pattern I've built in a default implementation of the interface to the main class and instead of passing in the interface object through the constructor I allow you to override my default with a property. This clearly violates some of the loose coupling, but to be honest I'm really not using this pattern for the loose coupling, I'm using it so I can Unit Test.

The biggest downside here would seem to be that the size of your code base has increased. And this has happened to enable loose coupling, but mostly to allow for Unit Testing. Is this a good thing?

When I first started learning about Unit Testing my biggest worry, apart from how much more code I'd be writing and maintaining, was that I would start writing strangely designed code just so I could Unit Test it and not because it was the right design for the situation. You can imagine all kinds of terrible things. Classes with all public methods. Classes with only a single static method that is only used by one other class. Classes with tentacles and suction cups and slimy oddly named and mostly irrelevant properties.

However, I was surprised at how often my design changed for the better due to thinking about testing. My first few unit testing attempts didn't result in any tentacles or other horrible things. That is, until I encountered dependency injection. "Look! I'm making my code loosely coupled!" I thought to myself, "That's a good thing! This Unit Testing really works!"

But dependency injection is a very slippery slope. After all, the majority of classes you write are riddled with dependencies. Things like databases, framework code, web services, user interfaces, other code libraries you've written, etc. Should you seriously create a class to wrap each one of these and then pass those classes into other classes that apply the "business logic" and "glue" between how those dependencies are used? Imagine how huge your code base would be then! And you certainly can't create static classes if you're planning on passing them into another class through a constructor. And of course now that you've created all these objects that wrap various dependencies, you're going to want to reuse them. In .NET that means putting them in their own assemblies. Are we going to end up with 500 assemblies with complicated dependencies? Where do we draw the line?

An object mocking framework like TypeMock (not free!) would help alleviate the need for all the "injection" going on here just for the sake of testing, though you'd still need to create many objects. I need to look into that product more to decide if its worth the cost (its quite expensive). If it is, then the line can be drawn with a question, "Do I need this to be loosely coupled?" If it isn't, then the line needs to be drawn with a different question, "Do I need to mock this for testing or to have it be loosely coupled?" The first question is pretty easy to answer. The second one... maybe not so much.

Lets end with a string of questions. Do you unit test? Do you mock objects? What mocking library do you use? How does dependency injection make you feel?