kwblog

Thursday, May 24, 2012

Hg Bookmarks Made Me Sad

Branches
Hg's branches are "permanent and global", meaning they are fixed to the changeset, can't be deleted, and can't be left behind when you push changes on that branch.

This is in contrast to git's branches, which are temporary and are not part of the changesets. I think of them as pointers.

It can be nice to have branch names on your commits, because it adds some meaningful context to the commits. It makes understanding your history very easy. The only downside that I am aware of is the potential for name collisions. Someone might try to create a branch using a name that someone else had already used. In which case you should really just use a different name... If there are other downsides, I don't know what they are.

Workflow
However, it has always been the recommendation of the Mercurial team that branches be used for long lived branches, not short term topic branches. Pre-2.0 they probably would have recommended using local clones, now they recommend using Bookmarks. I've found local clones less than ideal for my workflow, which typically looks like this:

Get latest changes
Decide what task to work on next
Create a branch-line to work on that task
Hack hack hack
If it takes awhile (>8hrs), merge default into the branch-line
If I have to go home for the night and I'm not done, push branch to remote as backup
Push branch-line to remote, have someone do a code review
Land branch-line on default, close branch-line

Some things to note about this:

The branches are small and generally short lived, one per "topic"
I want to push them to the remote for backup purposes (in case my computer fries over night or something)
I want to push them to remote so others can collaborate

This is why named-branches are so much more convenient for me than local clones.

However, in a recent release of Mercurial, they added a new notification message when you create a branch which says: "(branches are permanent and global, did you want a bookmark?)" So they couldn't be much more clear and in my face about the fact they think I should be using bookmarks for my workflow instead of named-branches.

Bookmarks

Bookmarks are basically Mercurial's version of git's temporary short lived branches. It means I'll lose the nice branch names on my commits in history. But I wont have to worry about name conflicts. This already doesn't seem like a worthwhile trade, but I'm willing to take the Mercurial dev's word for it and try it out. Sadly I found them, in their current state (2.2.1), to be bug prone and impractical. For the remainder of this post, I'd like to explain what I don't like about them as they are now. But since I don't want to have to explain the basics here, you should go read about them first: http://mercurial.selenic.com/wiki/Bookmarks. I'd like to throw this one little caveat in before I start, which is to say that it's totally possible I am miss using these things. I sincerely hope that's the case and someone will point out a better way to me. But I couldn't find any good real workflow examples of bookmarks, so I had to figure it out on my own.

Must create my own 'master' bookmark, everyone on the team must use this bookmark
When I create my first bookmark and commit on it, I've just created two heads on the default branch. I can easily find the head of my topic branch, it has a bookmark, but how do I find the head of the mainline?

Worse, say I publish the bookmark to the remote server and you do a pull -u. You will be updated to my topic bookmark, because it's the tip. That is NOT what either of us wanted. I created a topic branch because I didn't want it to get in your way. In fact, you shouldn't have to be aware of my branch at all!

So bookmarks are broken before we even get out of the gate. The work around is to create a 'master' bookmark that points at the mainline head. Everyone on the team will have to aware of this bookmark, and they'll have to be careful to always update to it.

Must merge 'master' with 'master@1' and bookmark -d 'master@1'
The next problem happens when you and I have both advanced the master bookmark. In the simplest case, maybe we both just added a new changeset directly on master. Lets say you push first, and I pull. If we weren't using bookmarks, hg would notify me when I pulled that there were multiple heads on my branch and it would suggest I do a merge. So I'd merge your update with my update and be back to only one head on the branch.

With bookmarks, it's more confusing. Hg will notify me that it detected a divergent bookmark, and it will rename your master bookmark to master@1 and leave it where it was. It will leave mine named master and leave it where it was. Now I have to "hg merge master@1; hg bookmark -d master@1;"

As a side note here, I was curious how git handles this problem, since git's branches are implemented so similarly to hg's bookmarks. The core difference is that git wont let you pull in divergent changes from a remote into your branch without doing a merge. It's conceptually similar to renaming the bookmark to master@1, since what git technically does is pull the changes into a "remote tracking branch" (that's a simplification, but close enough), and then merge that remote tracking branch onto your branch. But it has a totally different feel when you're actually using it.

Can't hg push, or it will push my changes without my bookmark
This is the most devastating issue. If I have created a new topic bookmark and committed on it, and then I do "hg push", it's going to push my changes to the remote without my bookmark! The bookmarks only get pushed when you explicitly push them with "hg push -B topic". Which means if I'm using bookmarks, I can't ever use the hg push command without arguments, or I'm going to totally confuse everyone else on the team with all these anonymous heads.

It's true that as long as the team is using the master bookmark and their own topic bookmarks, they shouldn't really have any problems here... But it's still totally confusing, and totally not what I wanted.

Suggestions
The Mercurial team feels very very very strongly about maintaining backwards compatibility. So it's probably a pipe dream to hope that this might change. But I have two suggestions on how these problems might be mitigated. These suggestions probably suck, but here they are anyway.

Hg up should prefer heads without bookmarks
If I do a pull -u and it brings down a new head, but that head has a bookmark, hg up should update to the head WITHOUT the bookmark. This would allow me to use bookmarks without them getting in the way of other members of the team.

I think it would also allow me to not have to create the 'master' bookmark. When I wanted to land a topic bookmark, I would just do: "hg up default; hg merge topic; hg ci -m "merged topic";" Since "default" is the branch name, hg would prefer the head without bookmarks, which would be the mainline.

Hg push should warn if pushing a head with a bookmark
This would be consistent with hg's treatment of branches. When you hg push, if you have a new branch, it aborts and warns you that you're about to publish a new branch. You have to do hg push --new-branch. I think it should do the same thing for bookmarks. This would prevent me from accidentally publishing my topic bookmarks.

I <3 Hg
I really like Mercurial. Even in the hg vs. git battle, I tend to prefer hg. I love how intuitive it's commands are, I love how awesome it's help is, I love it's "everything is just a changeset in the DAG" model (vs. git's "you can only see one branch at a time, what's a DAG?" model). And that's why bookmarks are making me sad. Every time I create a branch, hg tells me I'm doing it wrong, but bookmarks are way too unfriendly right now (unless I'm missing something huge [it wouldn't be the first time]).

I still strongly recommend Hg. If you're still using CVS, or Subversion, or heaven help you TFS, you should take a look at Mercurial.

And if you're a Mercurial expert (or a Mercurial developer!) please help me understand how to use bookmarks correctly!

PS. I thought about drawing little graph pictures to help explain the issues I laid out here, but I don't have a decent drawing tool at my disposal, and I didn't think this rant really deserved anymore time than I already put in. Hopefully you were able to make sense out of all these words.

Monday, May 21, 2012

Minor Issues: Constructors and MVC Controllers

Recently I've been getting into F# a little. It's a really cool language which has been broadening my perspective on problem solving with code. It's a .NET language, and is mostly compatible with C#, but it does do some things differently. For example, it has more rigorous rules around constructors:

type Point(x : float, y : float) =

  member this.X = x
  member this.Y = y

  new() = new Point(0.0, 0.0)

  new(text : string) =
    let parts = text.Split([|','|])
    let x = Double.Parse(parts.[0])
    let y = Double.Parse(parts.[1])
    new Point(x, y)

It may not immediately jump out at you, but there are some really cool things here that C# doesn't do:

This actually defines 3 constructors, the "main" constructor is implicitly defined to take in floats x and y
Constructors can call other constructors! Note the empty constructor, new().
All constructors ARE REQUIRED to call the "main" constructor

I fell in love with this immediately. This requirement forces me to be keenly aware of what my class's core data really is. And it communicates that knowledge very clearly to any consumers of the class as well. This was especially refreshing because I have found that since C# added object initialization syntax (new Point { X = 1.0, Y = 2.0 }) I've started writing far fewer constructors. Constructors are boilerplate and annoying to type, so I largely stopped typing them. But now that I have done that for awhile and I have a few real world examples of classes without constructors, I find that I miss the constructors. They communicate something nice about the core, and most important data, of the class that a whole lot of properties doesn't communicate.

So, that sounds pretty straight forward, and I should start writing constructors on my classes again. And if I want to be really strict (which I kind of do), I shouldn't provide a default constructor either. Then I'll be living in the rigorous world of class constructors, just like F#.

But this is where MVC Controllers finally make their first appearance in this post. Because these controllers exert there own pressure on my classes and down right require a default (parameter-less) constructor. At least that's the case with the way my team writes our controllers. Why? Here's an example.

Let's talk about CRUD. Typically there's a controller action we call "New", it returns an empty form so you can create a new record. This form posts to a controller action we call "Create", which binds the form data to a model, and calls .Save(). We're using MVC's standard form builder helpers, which generate form field names and IDs based on the model expression you provide as a lambda. This is how it knows how to bind the form back to your model on "Create". But this means you have to pass an empty model out the door in the "New" to generate the empty form. An empty model requires a default constructor! So the code looks like this:

public ActionResult New()
{
  ViewData.Model = new Point();
  return View();
}

[HttpPost]
public ActionResult Create(Point point)
{
  point.Save();
  return View();
}

Obviously, real code is more complicated than that, but you get the idea.

And so I find myself with a minor issue on my hands. On the one hand, I want to create F# inspired rigorous classes. But on the other hand I want simple controllers that can just send the Model out the door to the view. Alas, I can't have both, something has to give.

Obviously I could give up on the Constructors. Or, I could give up on passing my model directly to the view. There are other approaches well documented in this View Model Patterns post. The quick description is I could NOT pass my model, and instead pass a View Model that looks just like my model. And then I'd have to map that View Model back onto my Model somehow... But that comes with it's own minor issues.

So how about it? How do you deal with this issue?

Monday, May 14, 2012

Powershell Listing SQL Table Columns

Powershell has an awesome utility called sqlps that both lets you execute sql statements against a database, and implements a file system provider over SQL.

One of the things I use this for all the time is to inspect the columns of a table. Management Studio's tree view is terrible for this, especially compared to the flexibility of powershell which allows you to do things like:

Sort the columns by name, or by data type
Filter the columns by name, or by data type, or by nullable, etc

Here's a series of commands I use a lot that I thought was worth sharing:

sqlps
cd sql\localhost\default\databases\mydatabase\tables\schema.table\columns
ls | where { $_.Name -notlike 'ColPrefix*' } | select Name, @{Name="Type"; Expression={"$($_.DataType.Name)($($_.DataType.MaximumLength))"}}, Nullable

That will display all the columns that DO NOT have a name starting with ColPrefix and will show you each columns Name, Data Type (formatted like "nvarchar(255)"), and whether it allows nulls.

Enjoy!

Tuesday, May 8, 2012

Selfish Programmers: less flame-baity

Last post too flame-baity for you?

@kberridge I'd recommend rewriting to be less inflammatory and more level-headed. Reasonable thoughts, unreasonable delivery.
— Corey Haines (@coreyhaines) May 8, 2012

Fair enough!

It's far too easy to confuse "easy" with "simple." Rich Hickey touches on this a bit in this presentation, which is very similar to another talk he gave that I blogged about earlier. Almost every thing he said in these talks was very thought provoking for me. But the one that really hit home the hardest was this concept of easy vs. simple.

The difference between easy and simple is rather hard to firmly pin down. One way to think of it might be that easy means less effort. Fewer keystrokes, fewer concepts. The less I have to type, the easier it is to do something. The less I have to know, the easier it is. The fewer structures between me and what I'm trying to accomplish, the easier.

But easier doesn't necessarily mean simpler. Hickey associates simpler with un-twisted. So code that is DRY, and SOLID would be simple. Even if it requires more keystrokes, classes, and curly braces to write.

I find myself falling for this a lot. Sometimes it might be simpler to do more work, but that's hard to see. On the other hand, it's incredibly easy for me to judge how fun something will be for me to do, or how much tedious effort something with require.

The problem is that EASY is about me, where SIMPLE is about my code. So the deck is stacked against us as software developers. It's going to be difficult to separate whats easy for us from what's simple for our code and make the right design decision.

Being aware of this distinction is useful. And I certainly wasn't as aware of it before watching Hickey's talk. But it does raise an interesting question of how can we keep ourselves honest? How can we notice when we're doing the easy thing instead of the simple thing? While at the same time avoiding doing too much and over complicating?

Monday, May 7, 2012

Selfish Programmers

The biggest movement in software today is selfishness.

Ok, I've only been here for a short time, so what do I know, maybe it's always been like this. And people being selfish doesn't really constitute "a movement" (though I wouldn't be surprised if some people would be willing to argue that our generation is a very selfish one, I'm not sure how you would prove that we're more selfish than previous generations were at our age...).

What DOES constitute a movement is the continuous push toward tools that make a programmer's job "easier."

Yeah, you got that right, I'm about to take a stance against making things easier. Here's some examples of tech stuff that's supposed to "makes things easier":

Visual Studio

WinForms/WPF designers
WCF
Solutions and project files
Entity Framework (database first)
Linq-to-Sql

Active Record
DataMapper (ORM)
Convention ~~over~~ instead of configuration (I'm looking at you Rails)
ASP.NET MVC's UpdateModel (and Validations)
All Ruby code ever written

This stuff is "easier" because it requires "less work". Usually boring tedious work. No one likes boring tedious work (except on Monday morning when you're really tired). So naturally we try to get rid of that work. There are different strategies for getting rid of it. Microsoft likes to get rid of it by replacing it with drag and drop and magic tools that hide the details of what's going on so you don't have to learn anything. Ruby on the other hand puts extreme focus on minimalism and pretty syntax, favoring as few keystrokes as possible.

But do we think about what that's costing us, or costing our future selves? Nope! We're selfish sons of bitches and all we care about is doing stuff that we enjoy and think is "elegant" with as few keystrokes and effort as possible!

We like drag and drop and magic tools because it saves all that time learning things and typing. Unfortunately, it also dramatically reduces flexibility, so as soon as you step outside the boundary of the demo application, you find yourself stuck. Now the complexity of your solution skyrockets as you hack around the limitations of the drag and drop and magic.

And we like minimalism, cause it feels like simplicity. Our model not only looks exactly like the database, but it's automatically created by reflecting over the database, and then we send those models directly to the UI and mash them into our views! ITS SIMPLE! Well, it's less code, but is it simple? You've intimately twisted the knowledge your database through your entire code base, leaving you with the pressures of physical storage, UI layout, and controllers all tugging on the same objects. Every time you remove a layer or a concept from your code to make it simpler, you run the risk of tangling concepts and paying for it in the long run (Of course, every time you add a layer or a concept you run the risk of over abstracting, it's a balance).

In conclusion: stop being so selfish! Sometimes the best way to do something isn't the most fun, or elegant, or easy way. Toughen up! Do the work!

Dogma-less TDD and Agile

TDD is about rapid feedback of the code to the developer. Agile is about rapid feedback of the application to the development team.

Everything else is just BS.

Here are some of the things that fall into the BS category:

Up front design
"Architecture"
Project plans
Estimates/Story Points
Information Radiators
Team Velocity
Specifications
Code Reviews
QA
Approval processes
100% Test Coverage

It's not that these things don't have a purpose, or aren't useful. But they are all afflicted with varying degrees of BS (baseless guessing, built in uncertainty, outright lying, and occasionally even complete denial of reality).

What most of these things have in common is team organization. A one man team doesn't need this stuff. But any more than one person, and you require some way of keeping everyone on the same page. Especially if you are building software that all of the teammates don't completely understand. Without some kind of organization, people would be chasing their own ideas in all different directions. And since they don't fully understand the "business," those ideas are likely to be wrong (or at least partly wrong).

Thus, teams need a certain amount of BS. But I think it's important to remember the distinction. The most important thing to delivering real value is feedback. Feedback in code. And feedback in features. You need the BS, but apply it carefully, and try to keep the BS out of your feedback loops!

Monday, April 30, 2012

A Covert War

A covert war is being waged. A battle between competing forces. In all likelihood, it is taking place in your own code base! The outcome has the potential to change the very way you think and develop! It's influence will reach all layers of your application, from data access, to domain, to UI, and beyond! It is the war between databases and objects!

The battle strikes right to the heart of the three elements of Object Oriented Programming:

Encapsulation: data and methods are grouped together
Information Hiding: implementation details are hidden
Inheritance/Polymorphism: an object may derive from another object, inheriting its data and methods/object may override those methods, and objects can be referred to interchangeably through a uniform interface

The antagonists generally take the form of Object Relational Mappers. You can find this in both Data Mapper frameworks (like Hibernate), and Active Record frameworks (like Ruby's Active Record). They are clever. They come to you with promises of wonderful things: time savings, less maintenance, elimination of boiler plate code, and sometimes even database independence. Their siren song is hard to resist.

But the developer must be wary! For while their promises are sweet, their true sinister goal is the complete annihilation of object oriented code. Perhaps you are skeptical? To truly see the danger, we must understand how these so called ORMs are eroding the core values of object oriented programming.

Encapsulation

The ORM is leading a strong assault against Encapsulation. This is possibly it's most sinister move. Encapsulation is about putting data and behavior together. The ORM doesn't prevent that. But it does cause you to break data apart into separate objects that you might otherwise have kept together. It does this because it tricks you into thinking about the constraints of your physical storage instead of thinking about the design of your objects. Typical examples are objects that represent link tables. Or parent/child tables represented directly as objects, where the foreign keys are filled in manually by the consuming code, instead of being capsulated by an object.

Information Hiding

The ORM also subtly attacks information hiding. They make you a simple proposition: in return for automatic persistence of your data, all they ask is that your objects publicly expose each data field! On the surface, it seems so harmless, but it's a cunning attack at Information Hiding. No longer can your object hide away the details of the data, exposing an abstract interface of methods over that data. No! Now all that data is available to anyone!

You may feel this pain in simple ways. Like when you decide to change a few booleans to a single string flag. It seems simple enough, until you discover that, while you weren't looking, a large amount of code has come to depend on those booleans! Tragic rabbit-hole-refactoring then ensues.

Or the pain may be more subtle, but also more malevolent. Like when you are forced to write terribly awkward code in the setters of your fields in a vain and complicated attempt to keep your object in a valid state.

Inheritance/Polymorphism

On another front of the war, we find Inheritance being beaten back by the powerful onslaught of the ORM. Here the ORM is more direct in its methods, going so far as to tell you exactly what "kinds" of inheritance you may use. And coupling that inheritance directly to the structure of the database tables. Thus severely limiting any "Open Closed" type flexibility you might have hoped to gain! And woe to the poor developer that still attempts to leverage inheritance in a extensible type hierarchy sort of way, for the ORM will combat them at every turn! It's best to just give in, and follow the narrow table structure driven inheritance rules the ORM has laid out for you.

Now we see how the ORMs are attacking the core of our object oriented lifestyle. But, cunning as they are, it may still not be clear what an adverse affect this can truly have on your day to day development experience. In an attempt to shed some light on this issue, lets broaden our view to some typical coding patterns.

DDD Syndrome

The first, I will call the DDD (Domain Driven Design) Syndrome. With DDD Syndrome, the developer writes rich domain models, but attempts to persist those same models with the ORM. The pressures of the ORM end up overpowering the practices of DDD. A number of symptoms may immediately appear. For example, you find a model which has been forced to make all it's information public, but still attempts to expose an abstract interface. This leads to confusion in the code base about what methods and properties consumers of the object are really supposed to use.

You may also see well defined constructors settings up valid state, but at the same time a default constructor which bypasses these rules so the objects can be inflated by the ORM. Or worse still, overly complex and redundant code that attempts to keep the object in a valid state. This code frequently appears in property setters.

And due to the attack on encapsulation, DDD Syndrome often exhibits complex event based code that attempts to keep multiple objects (each representing separate tables) in some consistent state. This leads to logic being duplicated on different objects, and makes it very difficult to understand how the objects behave.

Service Syndrome

Service Syndrome is the opposite of DDD Syndrome. Instead of trying to put your logic in the persisted objects, you leave those objects anemic. This is a clever counter strike against the ORM in some ways, as you are letting it exist, but limiting it's involvement.

Unfortunately, the downside is you've moved your logic out of the objects, and into "services." Classes that have no state of their own, take persisted record objects as input, and house your logic. These classes are very procedural in nature and don't take advantage of any of the core OO elements!

Fight On!

This war is still being waged. For the moment, it appears that the ORMs are winning. But we must not lose hope! Micro-ORMs are a recent development, their influence on this battle is not yet fully known. Perhaps if wielded in the right manner they could allow the developer to relegate persistence concerns out of their true domain objects? It remains to be seen.

Monday, March 26, 2012

Are You A Hacker Or A Craftsman?

It's usually viewed as an either/or type of thing: Either you're a hacker, or you're a craftsman. The problem with any discussion around this is that the words have no clear and fixed definition. They mean something different to everyone. So conversations about this end up largely unproductive as everyone keeps talking past one another.

Lets start here. Hackers are generally characterized as rebels quickly banging out some code that does something impressive, but the code is unmaintainable crap that no one else can understand. And the hacker is generally only concerned with short term goals. On the other hand, craftsman are characterized as deliberately and carefully designing beautiful code, but it takes them forever and they're encombered by things like tests and principles and patterns. And the craftsman is usually looking ahead to the long term (at least where maintenance is concerned).

I don't think these either/or characterizations are useful. Here's a completely different take on "Hacker" from Rands in Repose talking about Facebook. He characterizes hackers as "believing something can always be better", and not accepting limitations preventing them from making it better. In the positive light, these "hackers" reject restrictive process and seek to be disruptive by doing new and better things. In a negative light, these "hackers" reject collaboration, are unreasonable, unpredictable, and not motivated by the same goals as the business.

This is not even close to being the same as the colloquial meaning of "hacker," but its an interesting blend of hacker and craftsman. It has the hacker's rebellious qualities, combined with the craftsman's broader vision.

And it's here that I think there is an interesting point to be made. Hackers lose points for not caring about the future, or the long term. And craftsmen lose points for losing sight of the broader objectives due to their uncompromising attention to code details.

Software development is nothing if not compromise. The team that shits out awful code quickly gets done first*. But then can't respond to change. The perfectionists with 100% code coverage and perfect SOLID code come in last*. But then can't effectively respond to change.

* Yes, asterisks. The team that writes the awful code probably wont finish first, because they spend most of their time debugging. And it's possible the team with 100% code coverage wont finish last, at least that's the argument, which you can believe if you want to (but I think it largely depends on your experience and your tooling).

I think it's pretty clear that there is a time for hacking, and there is a time for craftsmanship. The real trick is figuring out at any given moment, which mindset you should be in. And that requires a very broad understanding of the factors of the project, including: the vision of the product, the long term strategy, the short term goals, the complexity of the code, and the likelihood of change. All of which is subject to change. So good luck!!

Monday, March 19, 2012

Rebasing Is For Liars

Rebasing is very popular in the Git community. Of course, I'm primarily a Mercurial guy, but we have rebasing too. It's a built in extension, you just have to turn it on.

What is rebasing? The typical scenario goes something like this:

You make changes and commit one or more changesets
Meanwhile, other people have committed changes
You pull down those changes
But instead of merging, you rebase
Which detaches your changes from history and reapplies them after the changes you pulled in

People like this because it keeps the history linear, avoiding "merge bubbles." And certainly linear history is much easier to understand.

But I have a problem with rebasing: it's lying. Understanding the context that changes were made in can be very useful, but rebasing rewrites the history, changing the parent pointers, and thereby changing the context. Lying about what the code looked like when you changed it.

That said, I still use rebase. But only when my changes are small or inconsequential and I know that the consequences of lying about what the code looked like when I made those changes wont matter at all. And in those cases, it's nice to reorder the history to be sequential because it does limit the conceptual overhead of understanding those kinds of changes. But in general, I prefer to see the merges simply because it accurately represents what really happened.

Monday, March 12, 2012

Simple Made Easy

Simple Made Easy
"Rich Hickey emphasizes simplicity’s virtues over easiness’, showing that while many choose easiness they may end up with complexity, and the better way is to choose easiness along the simplicity path."

I absolutely recommend you take the hour to watch this presentation. It's pretty easy viewing, he's funny, and I found it very influential.

Highlights
"Your ability to reason about your program is critical to changing it without fear." This has been something I've firmly believed for a very long time, but I love how succinctly Hickey puts it here. He even has the courage to challenge the two most popular practices of Software Engineering today: Agile, and TDD. For Agile, he's got this line: "Agile and XP have shown that refactoring and tests allow us to make change with zero impact. I never knew that, I still do not know that." Agile is supposed to make the fact of change one of the primary motivators behind how the project is run, but it doesn't really make applying that change any easier in the code... For TDD he has this wonderful quip:

"I can make changes 'cause I have tests! Who does that?! Who drives their car around banging against the guard rails saying, "Whoa! I'm glad I've got these guard rails!"

He calls it guard rail programming. It's a useful reminder that while tests are definitely valuable, they can't replace design and thoughtful coding.

Another very enlightening comment he made had to do with the difference between enjoyable-to-write code and a good program. This rang very true with me, probably because of all the Ruby bigots these days who are obsessed with succinct or "beautiful" code, but are still writing big balls of mud. Hickey basically said he doesn't care about how good of a time you had writing the program. He cares about if it's complexity yields the right solution, and can be reasoned about/maintained.

Which leads to another concept he brings up of Incidental Complexity vs. Problem Complexity. The argument that the tools you choose to use in your software can bring along extra complexity that has nothing whatsoever to do with the actual problem your program is supposed to solve.

Hickey Says I'm Wrong
I just wrote a series of posts where I was attempting to question some of the assumptions behind many of what are commonly considered good design practices in static object-oriented languages today:

I covered alot of stuff in that series. One of the things I was really challenging is the practice of hiding every object behind an interface. I argued this indirection just made things more complicated. At about 50 minutes in, Rich Hickey says every object should only depend on abstractions (interfaces) and values. To depend on a concrete instance is to intertwine the "What" with the "How" he says. So, he's saying I'm wrong.

I also talked about how Dependency Injection is leaky and annoying. But Rich Hickey says you want to "build up components from subcomponents in a direct-injection style, you want to, as much as possible, take them as arguments", and you should have more subcomponents than you probably have right now. So, yeah, I'm wrong.

I didn't actually blog about this one, but I've certainly talked about it with alot of people. I've been a proponent of "service layers" because I want my code to be as direct as possible. I want to be able to go one place, and read one code file, and understand what my system does. For example if I send an email when you create a task, I want to see that right there in the code. But Hickey says it's bad to have object A call to object B when it finishes something and wants object B to start. He says you should put a queue between them. So, wrong again!

I'm also a proponent of Acceptance Test Driven Development (ATDD) and writing english specs that actually test the system. Hickey says that's just silly, and recommends using a rules engine outside your system. :(

And finally, and this is the biggest one, he says:

"Information IS simple. The only thing you can possible do with information is RUIN it! Don't do it! We got objects, made to encapsulate IO devices. All they're good for is encapsulating objects: screens and mice. They were never supposed to be applied to information! And when you apply them to information, it's just wrong. And it's wrong because it's complex. If you leave data alone, you can build things once that manipulate data, and you can reuse them all over the place and you know they are right. Please start using maps and sets directly."

Um, yeah, ouch. I'm an object oriented developer. I read DDD and POEAA three years ago and got really excited about representing all my information as objects! We extensively prototyped data access layers, Entity Framework and NH chief among them. We settled on NH. Worked with it for awhile but found it too heavy handed. It hid too much of SQL and clung too much to persistence ignorance. But I couldn't really understand how to use a Micro-ORM like Massive (or Dapper or PetaPoco) because I was too hung up on the idea of Domain Objects. So we spiked an ORMish thing that used Massive under the covers. It supported inheritance and components and relationships via an ActiveRecord API. It gave us the flexibility to build the unit testing I always wanted (which I recently blogged about). It is still working quite well. But it's information represented as objects. So it's wrong...

In case you didn't pick up on it, Rich Hickey wrote Clojure, a functional language. I don't know anything about functional programming. I've been meaning to learn some F#, but haven't gotten that into it yet. So it doesn't really surprise me that Hickey would think everything I think is wrong. Functional vs. OOP is one of the biggest (and longest running) debates in our industry. I think it is telling that I've felt enough pain to blog about lots of the things that Hickey is talking about. But I don't find it disheartening that his conclusions are different than mine. It is possible that he is right and I am wrong. It is also possible that we are solving different problems with different tools with different risks and vectors of change and different complexities. Or, maybe I really should get rid of all my active record objects and just pass dictionaries around!

In any case, this certainly was a very eye opening presentation.

Monday, March 5, 2012

Database Seed Data

Almost two years ago, we started a major new development effort at Pointe Blank. We had the extreme fortune to be starting some new projects on a completely new platform (namely ASP.NET MVC instead of WinForms), which gave us the opportunity to take all the things we'd learned and start from scratch.

One of the main things we had learned was the importance of designing for automation. Probably the single most valuable automation related thing we did was automate the scripting of our database. We are using FluentMigrator to define migrations and psake with sqlcmd to create and drop database instances. I'm not all that crazy about FluentMigrator (its syntax is overly verbose, it's code seems overly complex) but it has worked very well for us so far.

We use this tool in a few different circumstances:

To apply other people's changes to our local dev databases
To migrate our QA environment
To spin up from scratch (and tear down) automated test environments
To deploy database changes to production

Database migrations have made working with our code base so dramatically easy, it's really amazing. You can spin up a new environment with one command in seconds. Incorporate other devs changes without a second thought.

Migration tools pay most of their attention to schema changes, but data gets largely ignored. So much so that we had to rough in our own approach to dealing with data, which works, but also sucks. And there doesn't seem to be any clear recommendations for data. There are some categories of data I'm concerned with:

List Data: these are values that are usually displayed in drop downs and are either not user generated, or there is a default set we want to seed (ex: states, name suffixes, record types, etc)
Configuration Data: this is data that is required for the system to work, (ex: server uris, an email address to send from, labels to display in the app, etc)

There are only two approaches to dealing with seed that I'm aware of:

Run the schema migrations, then insert all the seed data
Insert the seed data inside the migrations

At first blush #1 seemed easier, so it's the approach we originally took, but it has some drawbacks. The first challenge is to avoid duplicates. You can do that with IF NOT EXISTS(...) statements, or by inserting Primary Keys and ignoring PK violation errors. The second is dealing with schema changes to tables with seed data.

For example, suppose you write a migration to create a table, and you write insert statements to add seed data to it. You have already run your migrations and seeded the database. Now you write a new migration which adds a new not null column to the table. This migration will fail because the table already has data from the seeds and the new column requires a value. In this situation, you're hosed. You have to write the migration to add data into that column before making the column not allow nulls (or use a default value). Effectively you have to duplicate the seed data for that column in both the migration and the seed data file. You need it in both places in order to support creating the database from scratch and migrating an existing database.

#2 seems to have it's own downsides too. There is no one place that shows you all the data that will be seeded in a given table, it's spread throughout the migrations. It precludes you from having a schema.rb type of system (unless that system could somehow be smart enough to include data). That point is somewhat academic at this point though, because FluentMigrator doesn't have anything like this.

However, #2 is much more flexible and can accommodate anything you could ever dream of doing with data (that SQL would support of course). And I feel that pure #2 would be better than a #1/#2 hybrid, because it's just more straight forward.

And now I'd like to ask you, do you know of a good way of dealing with seed data? Are there any existing tools or strategies out there that you could link me to?

Monday, February 27, 2012

Better mocking for tests

The common approach I've seen to TDD and SOLID design in static languages requires a lot of interfaces and constructors that accept those interfaces: dependency injection.

If you only develop in a static language, you probably don't see anything wrong with this. Spend some time in a dynamic language though, and these overly indirect patterns really start to stand out and look like a lot of extra work.

Here's the rub. When your code legitimately needs to be loosely coupled, then IoC, DI, and interfaces rock. But when it doesn't need to be loosely coupled for any reason OTHER THAN TESTING, why riddle the code with indirection?

Indirection costs. It takes time, adds significantly more files to your code base, adds IoC setup requirements, forces more complicated design patterns (like Factories), and most importantly makes code harder to understand.

Testing is important. So if we have to make our code more abstract than strictly necessary to test, it's worth it. But there are techniques other than DI that we can use to make our code unit testable while still reducing the amount of abstraction!

Limited Integration Testing
The first way to unit test without interfaces is to not mock out the object-under-test's dependencies. To not unit test. I'm not talking about doing a full stack test though. You can still mock out the database or other third party services, you just might not mock out your object's direct dependencies.

A -> B -| C
  -> D -| E

Here A depends on B and D which depend on C and E. We've mocked out C and E but not B and D.

The benefit of this approach is that there is absolutely no indirection or abstraction added to your code. The question is what should you test? I'm still thinking of this as a unit test, so I really only want to test class A. It's still going to be calling out to B and D though. So what I want to do is write the test so that it's focused on verifying the behavior of A. I'll have separate unit tests for B and D.

I'll reuse the example from Demonstrating the Costs of DI again. If I wanted to test PresentationApi I could mock out the database layer but still let it call into OrdersSpeakers. In my test I need to verify that PresentationApi did in fact cause the speakers to be ordered, but I don't care if they were ordered correctly. I can do this by verifying that the speaker I added has it's Order property set. I don't care what it is set to, as long as it's set I know PresentationApi called OrdersSpeakers. The OrdersSpeakers tests will verify that the ordering is actually done correctly.

The downside of this technique is that your test must have some knowledge about the object-under-test's dependencies and frequently those object's dependencies. You might expect these tests to be brittle, but surprisingly I haven't had any problems with that. It's more just that to conceptually understand and write the test, you have to think about the object's dependencies.

Static Funcs
I learned this technique from a blog post Ayende wrote some time ago. And he recently showed another example. In this technique you expose a lambda method which is set to a default implementation. The system code calls the lambda which wraps the actual implementation. But the tests can change the implementation of the lambda as needed.

I'll use an example I think is probably common enough that you'll have had code like this somewhere: CurrentUser.

public static class CurrentUser
{
  public static Func GetCurrentUser = GetCurrentUser_Default;

  public static User User { get { return GetCurrentUser(); } }

  private static User GetCurrentUser_Default()
  {
    // ... does something to find the user in the current context ...
  }

  public static ResetGetCurrentUserFunc() { GetCurrentUser = GetCurrentUser_Default; }
}

[TestFixture]
public class DemoChangingCurrentUser
{
  [TearDown]
  public void TearDown() { CurrentUser.ResetGetCurrentUserFunc(); }

  [Test]
  public void TestChangingCurrentUser()
  {
    var fakeCurrentUser = new User();
    CurrentUser.GetCurrentUser = () => fakeCurrentUser;
    
    Assert.AreEqual(fakeCurrentUser, CurrentUser.User);
  }
}

The previous approach could not have handled a case like this, because here the test actually needs to change the behavior of it's dependencies. The benefit of this approach is that we have poked the smallest possible hole in the code base to change only what the test needs to change. And also, the added complexity is constrained within the class it applies to.

But the biggest downside is exemplified by the TearDown method in the test. Since the func is static, it has to be reset to the default implementation after the test runs, otherwise other tests could get messed up. It's nice to build this resetting INTO your test framework. For example, a test base class could reset the static funcs for you.

Built In Mocking
This is by far my favorite technique, which is why I saved it for last ;) I first saw this in practice with the Ruby on Rails ActionMailer. The idea is that you don't actually mock, stub, or substitute anything. Instead the framework you are calling into is put in test mode. Now the framework will just record what emails were sent but not actually send them. Then the test just queries on the sent mails to verify what mail was attempted to be sent.

In the case of ActionMailer, every email you try to sends gets stored in an array, ActionMailer::Base.deliveries. The tests can just look in this array and see if the expected email is there, as well as verify that the email is in the correct format.

With the right mindset this can be applied in all kinds of places. The most powerful place I've found is to the database layer. The database access layer we wrote at Pointe Blank includes this style of mocking/testing. So for example, a typical unit test that mocks out the database in our codebase might look something like this:

[Test]
public void TestAddSpeaker()
{
  ...
  presentationApi.AddSpeaker(speaker);

  TestLog.VerifyInserted(speaker);
}

The upsides to this technique are really awesome. First off, there are no interfaces or dependency injection between the code and the persistence layer. Yet the test can still "mock out" the actual persistence.

Secondly, there is no ugly mocking, stubbing, or faking code in your tests. Instead of testing that certain method calls occurred, you are simply verifying results! By looking in the TestLog, you're verifying that the actions you expected really did occur.

Thirdly, the TestLog becomes a place where a sophisticated query API can be introduced allowing the test to express exactly what it wants to verify as succinctly as possible.

Fourthly, this code is completely orthogonal to any changes that may happen in how a speaker is saved, or how the actual persistence framework works. The code is only dependent on the TestLog. This means the test will not break if irrelevant changes occur in the code. It is actually written at the correct level of abstraction, something that is frequently very difficult to achieve in unit tests.

The only downside isn't even a downside to this technique. The downside is that since built in mocking is so great, you may have a tendency to use the limited integration testing approach more than you should. You may find yourself integration testing everything down to the database layer because the database layer is so easy to mock out. Sometimes that's OK, but sometimes you really shouldn't test through a class dependency.

So those are three techniques I've discovered so far for writing unit tests without riddling my code with interfaces, dependency injection, and IoC. Applying these allows me to keep my code as simple as possible. I only need to apply indirection where it is actually warranted. There is a trade off, as I pointed out in the downsides of these techniques, namely in that your tests may be a bit more error prone. But I feel those downsides are more than acceptable given how much simpler the overall code base becomes!

Monday, February 20, 2012

Meaning Over Implementation

Conveying the meaning of code is more important than conveying the implementation of code.

This is a lesson I've learned that I thought was worth sharing. It may sound pretty straight forward, but I've noticed it can be a hurdle when you're coming from a spaghetti code background. Spaghetti code requires you to hunt down code to understand how the system works because the code is not 'meaningful.'

There are really two major components to this. The first is, if you're used to spaghetti code, you've been trained to mistrust the code you're reading. That makes it hard to come to terms with hiding implementation details behind well named methods, because you don't trust those methods to do what you need unless you know every detail of HOW they do it.

The second is in naming. Generally I try to name methods so that they describe both what the method does as well as how it does it. But this isn't always possible. And when it's not possible, I've learned to always favor naming what the method is for as opposed to how the method is implemented.

There are a number of reason why this works out. The first is that issue of trust. If your code base sucks, you're forced to approach every class and every method with a certain amount of mistrust. You expect it to do one thing, but you've been bitten enough times to know it's likely it does some other thing. If this is the kind of code base you find yourself in, you're screwed. Every programming technique we have at our disposal relies on abstraction. And abstraction is all about hiding complexity. But if you don't trust the abstractions, you can't build on them.

The other reason this works is the same reason SRP (the Single Responsibility Principle) works. You can selectively pull back the curtain on each abstraction as needed. This allows you to understand things from a high level, but then drill down one layer at a time into the details. I like to think of this as breadth-first programming, as opposed to depth-first programming.

So when you're struggling with naming things, remember your first priority should be to name things so that others will understand WHAT it does. They can read the implementation if they need to understand HOW it does it.

Friday, February 17, 2012

Abstraction: blessing or curse

Well this has been a fun week! It's the first time I've ever done a series of posts like this, and all in one week. But now it's time to wrap up and then rest my fingers for awhile.

All of this stuff has been about dealing with abstraction. Seriously, think about what you spend most of your time doing when you're programming. For me, I think it breaks down something like this:

50%: Figuring out why a framework/library/control/object isn't doing what I wanted it to do
30%: Deciding where to put things and what to name them
20%: Actually writing business logic/algorithms

So, yeah, those numbers are completely made up, but they feel right... And look at how much time is spent just deciding where things should go! Seriously, that's what "software engineering" really is. What should I name this? Where should I put this logic? What concepts should I express? How should things communicate with each other? Is it OK for this thing to know about some other thing?

This is abstraction. You could just write everything in one long file, top to bottom, probably in C. We've all been there, to some degree, and it's a nightmare. So we slice things up and try to hide details. We try to massage the code into some form that makes it easier to understand, more intuitive. Abstraction is our only way to deal with complexity.

But abstraction itself is a form of complexity! We are literally fighting complexity with complexity.

I think this is where some of the art of programming comes into play. One of the elements of beautiful code is how it's able to leverage complexity but appear simple.

All of our design patterns, principles, and heuristics (from Clean Code) can largely be viewed as suggestions for "where to put things and what to name them" that have worked for people in the past. It's more like an aural tradition passed from generation to generation than it is like the strict rules of Civil Engineering. This doesn't make any of these patterns less important, but it does help explain why good design is such a difficult job, even with all these patterns.

In this series of posts I've blabbed on about a number of different design topics, each time almost coming to some kind of conclusion, but never very definitively. That's because this has been sort of a public thought experiment on my part. I'm just broadcasting some of the topics that I have been curious about recently. And I think where I would like to leave this is with one final look at the complexity of abstraction.

Abstraction, by definition, is about hiding details. And anyone who has ever done any nontrivial work with SQL Server (or any database) will understand how this hiding of details can be a double edged sword. If you had to understand everything in it's full detail, you could never make it work. But sometimes those hidden details stick out in weird ways, causing things to not work as you were expecting. The abstraction has failed you. But this is inevitable when hiding details, at some point, one of those details with surprise you.

Indirection and abstraction aren't exactly the same thing, but they feel very related somehow, and I think Zed Shaw would forgive me for thinking that. Indirection, just like abstraction, is a form of complexity. And just like abstraction, it can be incredibly useful complexity. I assert that every time you wrap an implementation with an interface, whether it's a Header Interface or an Inverted Interface, you have introduced indirection.

In Bob Martin's Dependency Inversion Principle article, he lays out 3 criteria of bad design:

Rigidity: Code is hard to change because every change affects too many parts of the system
Fragility: When you make a change, unexpected parts of the system break
Immobility: Code is difficult to reuse because it cannot be disentangled from the current application

I think there is clearly a 4th criteria which is: Complexity: Code is overly complicated making it difficult to understand and work with. Every move in the direction of the first three criteria, is a move away from the fourth. Our challenge then, indeed our art, is to find the sweet spot between these competing extremes.

And so I ask you, is it a beautiful code type of thing:

To wrap every concept behind abstract objects (objects or data structures)?
To represent every behavior as a method of some stateful thing (service layers or oop)?
To hide every class behind an indirection in the form of an interface (header interfaces or inverted interfaces)?
To make every class reusable and plugable (DIP: loose or leaky)?

Thursday, February 16, 2012

DIP: Loose or Leaky?

In the last post I looked a bit at the Dependency Inversion Principle which says to define an interface representing the contract of a dependency that you need someone to fulfill for you. It's a really great technique for encouraging SRP and loose coupling.

The whole idea behind the DIP is that your class depends only on the interface, and doesn't know anything about the concrete class implementing that interface. It doesn't know what class it's using, nor does it know where that class came from. And thus, we gain what we were after: the high level class can be reused in nearly any context by providing it with different dependencies. And on top of that, we gain excellent separation of concerns, making our code base more flexible, more maintainable, and I'd argue more understandable. Clearly, the DIP is awesome!

But! I'm sure you were waiting for the But... Since we now have to provide our object with it's dependencies, we have to know:

Every interface it depends on
What instances we should provide
How to create those instances

This feels so leaky to me! I used to have this beautiful, self contained, well abstracted object. I refactor it to be all DIP-y, and now it's leaking all these details of it's internal implementation in the form of it's collaborators. If I had to actually construct this object myself in order to use it, this would absolutely be a deal breaker!

Fortunately, someone invented Inversion of Control Containers, so we don't have to create these objects ourselves. Or, maybe unfortunately, 'cause now we don't have to create these objects ourselves, which sweeps any unsavory design issues away where you wont see them...

What design issues? Well, the leaking of implementation details obviously! Are there others? Probably. Maybe class design issues with having too many dependencies? Or having dependencies that are too small and should be rolled into more abstract concepts? But neither of these is a result of injecting Inverted Interfaces, only the implementation leaking.

I do believe this is leaky, but I'm not really sure if it's a problem exactly. At the end of the day, we're really just designing a plugin system. We want code to be reusable, so we want to be able to dynamically specify what dependencies it uses. We're not forced to pass the dependencies in, we could use a service locator type approach. But this has downsides of it's own.

In the next post, I'll wrap this up by zooming back out and trying to tie this all together.

Wednesday, February 15, 2012

Header Interfaces or Inverted Interfaces?

Thanks to Derick Bailey for introducing me to this concept: Header Interfaces. A Header Interface is an interface of the same name as the class which exposes all of the class's methods, generally not intended to be implemented by any other classes, and usually introduced just for the purposes of testing. Derick compared these to header files in C++. In my earlier Interfaces, DI, and IoC are the Devil post, these are really the kinds of interfaces I was rallying against. (I'll use the ATM example from Bob Martin's ISP article throughout this post)

public class AtmUI : IAtmUI {...}

One of the things I really found interesting about this was the idea that there are different kinds of interfaces. So if Header Interfaces are one kind, what other kinds are there?

The first things that came to my mind were the Interface Segregation Principle and the Dependency Inversion Principle. ISP obviously deals directly with interfaces. It basically says that interfaces should remain small. "Small" is always a tricky word, in this case it's about making sure that the clients consuming the interfaces actually use all the methods of the interface. The idea being that if the client does not use all the methods, then you're forcing the client to depend on methods it actually has no use for. Apparently this is supposed to make things more brittle.

I said "apparently" because I've never directly felt this pain. I guess it really only comes into play if you have separate compilable components and you are trying to reduce how many things have to be recompiled when you make a change. I use .NET, so Visual Studio, and it's not a picnic for controlling compilable components... I think I have this requirement though, and just haven't figured out how to deal with it in .NET. But for the moment, lets assume we agree that ISP is a good thing.

public class AtmUI : IDepositUI, IWithdrawalUI, ITransferUI {...}

The DIP leads to a similar place as ISP. It tells us that higher level components should not depend on lower level components. There is some room for uncertainty here around which components are higher than others. For example, is the AtmUI a higher level than the Transaction? I'll go with no, because the Transaction is the actual driver of the application, the UI is just one of it's collaborators. Because of this, the DIP leads us to create separate interfaces to be consumed by each Transaction:

public class AtmUI : IDepositUI, IWithdrawalUI, ITransferUI {...}

So, maybe there are at least two types of interfaces: Header Interfaces, and what I'll coin Inverted Interfaces. In the last post I talked about the "Service Layer" pattern. It generally leads to the creation of what feel more like Header Interfaces. But this is tricky, because the only difference I can really find here is based on who owns the interface. An Inverted Interface is owned by the class that consumes the interface, and a Header Interface is owned by the class that implements the interface.

But sometimes the difference isn't really that clear cut. If you're TDDing your way through an application top-down in the GOOS style, the Service Layers are designed and created based on the needs of the "higher" level components. So the component, and it's interface, both spring into existence at the same time. So if the service only has one consumer right now, the interface feels very Header-y. On the other hand, it was created to fulfill the need of a higher level component; very Inverted-y.

But if someone else comes around and consumes the same service later: well now we have some thinking to do. If we reuse the interface, then I guess we've made it a Header Interface. Would Uncle Bob have said to create a new interface but make the existing service implement it? The lines are blurred because the components we're dealing with all reside within the same "package" and at least right now don't have any clear call to be reused outside this package.

Sadly, the introduction of these interfaces brings us back to Dependency Injection. So in the next post, I'll look at the Dependency Inversion Principle, and the consequences of these Inverted Interfaces.

Tuesday, February 14, 2012

Service Layers or OOP?

In the last post, I mentioned that my co-workers and I had settled on a design that was working very well for us, but that wasn't very "object-oriented," at least not in the way Bob Martin described it.

We evolved to our approach through a lot of TDD and refactoring and plain old trial and error, but the final touches came when we watched Gary Bernhardt's Destroy All Software screencasts and saw that he was using pretty much the same techniques, but with some nice naming patterns.

I don't know if there is a widely accepted name for this pattern, so I'm just going to call it the Service Layer Pattern. It's biggest strengths are it's simplicity and clarity. In a nutshell I'd describe it by saying that for every operation of your application, you create a "service" class. You provide the necessary context to these services, in our case as Active Record objects (NOTE: service does NOT mean remote service (ie, REST), it just means a class that performs some function for us).

So far so basic, the real goodness comes when you add the layering. I find there are a couple ways to look at this. The more prescriptive is similar to DDD's (Domain Driven Design) "Layered Architecture" which recommends 4 layers: User Interface, Application, Domain, and Infrastructure. From DDD:

The value of layers is that each specializes in a particular aspect of a computer program. This specialization allows more cohesive designs of each aspect, and it makes these designs much easier to interpret. Of course, it is vital to choose layers that isolate the most important cohesive design aspects.

In my Demonstrating the Costs of DI code examples the classes and layers looked like this:

SpeakerController (app)
 > PresentationApi (app)
   > OrdersSpeakers (domain)
   > PresentationSpeakers (domain)
     > Active Record (infrastructure)
   > Speaker (domain)
     > Active Record (infrastructure)

This concept of layering is very useful, but it's important not to think that a given operation will only have one service in each layer. Another perspective on this that is less prescriptive but also more vague is the Single Responsibility Principle. The layers emerge because you repeatedly refactor similar concepts into separate objects for each operation your code performs. It's still useful to label these layers, because it adds some consistency to the code.

Each of these services is an object, but that doesn't make this an object-oriented design. Quite the opposite, this is just well organized SRP procedural code. Is this Service Layer approach inferior to the OOP design hinted at by Uncle Bob? Or are these actually compatible approaches?

The OOP approach wants to leverage polymorphism to act on different types in the same way. Does that mean that if I have a service, like OrdersParties, that I should move it onto the Party object? What about the PartyApi class, should I find some way of replacing that with an object on which I could introduce new types?

There is a subtle but important distinction here. Some algorithms are specific to a given type: User.Inactivate(). What it means to inactivate a user is specific to User. Contrast that with User.HashPassword(). Hashing a password really has nothing to do with a user, except that a user needs a hashed password. That is, the algorithm for hashing a password is not specific to the User type. It could apply to any type, indeed to any string! Defining it on User couples it to User, preventing it from being used on any string in some other context.

Further, some algorithms are bigger than a single type. Ordering the speakers on a presentation doesn't just affect one speaker, it affects them all. Think how awkward it would be for this algorithm to reside on the Speaker object. Arguably, these methods could be placed on Presentation, but then presentation would have a lot of code that's not directly related to a presentation, but instead to how speakers are listed. So it doesn't make sense on Speaker, or on Presentation.

Some algorithms are best represented as services, standing on their own, distinctly representing their concepts. But these services could easily operate on Objects, as opposed to Data Structures. Allowing them to apply to multiple types without needing to know anything about those specific types. So I think the Service Layers approach is compatible with the OOP approach.

In the next post I'll take a look at how interfaces fit into this picture.

Monday, February 13, 2012

Objects or Data Structures?

Here's a great (and old) article from Bob Martin called Active Record vs Objects. You should read it. I think it might be one of the best treatments of the theoretical underpinnings of Object Oriented design I've read, especially because it pays a lot of heed to what OOP is good at, and what it's not good at.

Here's some of my highlights:

Objects hide data and export behavior (very tell-don't-ask)
Data structures expose data and have no behavior
Algorithms that use objects are immune to the addition of new types
Algorithms that use data structures are immune to the addition of new functions
Apps should be structured around objects that expose behaviors and hide the database

This all feels right and stuff, but it's all pretty theoretical and doesn't help me decide if my code is designed as well as it could be. And that's what I'm going to be writing about. In one post a day for the rest of the week I'll look at various elements of "good design," and try to fit the pieces together in some way I can apply to my code.

Good designers uses this opposition to construct systems that are appropriately immune to the various forces that impinge upon them.

He's talking about the opposition between objects and data structures in terms of what they're good for. So apparently a good designer is a psychic who can see the future.

But that is the hard part, how do you know where you'll need to add "types" vs. where you'll need to add "functions"? Sometimes it's really obvious. But what I'm starting to think about is, maybe I need to get more clever about the way I think about types. Because if Uncle Bob thinks apps should be structured around objects, that means he thinks there are lots of examples where you're going to need to add a new type. Whereas, when I think about my code, I'm not really finding very many examples where I could leverage polymorphism to any useful effect.

This could simply be because the problems I'm solving for my application are simply better suited for data structures and functions. Or it could be because I'm just not approaching it from a clever enough OO angle.

Recently, my co-workers and I had pretty well settled on a design approach for our application, and it has been working extremely well for us. However, this article and it's clear preference for objects and polymorphism has me wondering if there may be another perspective that could be useful. I'll talk more about this in the next post.

Sunday, February 5, 2012

Demonstrating the Costs of DI

Here's a followup to my previous post "Interfaces, DI, and IoC are the Devil"

This time I want to demonstrate a little of what I'm talking about with some code samples. I have contrived an example. I wanted something that is real, so I took the code structure and the responsibilities from some real code in one of our applications. But I changed the names to make it more easily understood. I also removed some of the concerns, such as transaction handling and record locking, just to make it shorter. I'm trying to be fair, so the example is not trivial, but it is also not complicated.

Note that I didn't compile any of this code, so please excuse typos or obvious syntactic errors I might have over looked.

Pretend we are writing an application to manage a conference. It's an MVC 3 C# web app. We have presentations, and presentations have speakers. The speakers are ordered (1,2,3, etc). As an optimization, we will cache some of the information about the speakers for a given presentation: how many are there and who is #1 ordered speaker. This information will be cached on the PresentationSpeakers table.

The structure of the code is as follows. The SpeakerController's Create action method is called to add a new speaker to a presentation. This controller delegates the job to the PresentationApi class. This class deals with coordinating the various domain services, and in real life would have dealt with database transactions and record locking/concurrency. PresentationApi delegates to OrdersSpeakers (which obviously assigns the order numbers to the speakers) and PresentationSpeakers (which caches information about the speakers on a presentation). Finally the Speaker and Presentation classes are active record objects.

This first example demonstrates the simplest way in which I would like to write this code. There are no interfaces, there is no constructor injection, and it is using the Active Record pattern for database persistence.

The next example addresses the "problem" of the controller having a direct dependency on the PresentationApi class by adding IPresentationApi and injecting it through the controller's constructor. Notice that I also convert PresentationApi to be a singleton at this point and remove it's instance variables. This isn't strictly required, but it is typical. Notice how I now have to pass the presentationId into the LoadPresentationSpeakersRecord helper method.

In the third example, I remove PresentationApi's direct dependency on OrdersSpeakers.

Finally I eliminate ActiveRecord and replace it with the most evil pattern in the world, the Repository pattern. I chose to implement this in the way I've most commonly seen, with one interface per database table (give or take).

So, if you had your choice, which of these versions would you prefer to have in your code base? Which would you prefer to have to debug, understand, and maintain?

My answer, not surprisingly, is #1. Notice how each version adds more and more cruft into the code and obscurs what the code is actually trying to accomplish. This is the cost of indirection. It's really the same reason people like to fly the YAGNI flag. And perhaps you've heard the old adage, "Do the simplest thing that could possibly work". I desperately want as few constructs between me and what my code does as possible. Which is why I yearn for the simple code in example #1.

PS. If you actually read those code samples and studied the differences between them, pat yourself on the back! I know this is very difficult to follow. I actually wasn't intending to blog it, I just wanted to go through the motions myself and see how it turned out. There have been quite a few times where I wrote something off because of my image of how it would turn out. But then when I actually did it, it turned out much differently (MVC, MVP, and MVVM were all like that for me). But in this case, it turned out just how I'd imagined it... Crufty.

Wednesday, February 1, 2012

Interfaces, DI, and IoC are the Devil

I want to write unit tests. To do that, I need to be able to swap out the "dependencies" of the object under test with mocks, or stubs, or fakes. In static languages like C# or Java this is accomplished by two means:

Using interfaces instead of the concrete class
Injecting instances of those interfaces through the constructor (Dependency Injection)

There are other ways, but this seems to be the widely regarded "best" way. And I don't like it.

The core of my dislike stems from one simple thing: interfaces don't describe constructors. So once I'm forced to depend on interfaces, I can no longer use the programming language to create objects. I can't use the new keyword. The simplest most fundamental feature of all objects, creating them, is barred from me.

To be clear, this isn't Dependency Injection's fault, nor is it IoC's fault. It's the interface's fault. Here's my favorite feature of Ruby:

MyClass.Stub(:new) {...}

That simple line of code stubs out the new 'message' to the MyClass type, allowing me to return a stub instead of an actual instance of MyClass. This demonstrates why Ruby is so much easier to test than today's static languages:

You can intercept any message to the object w/o having to depend on an interface
You can intercept the constructor the same way you'd intercept any other message

But back to topic, why is it a problem that I can't use the new method? In my experience this causes some nasty coding practices to emerge. I'll take them in turn:

Everything is a singleton
As soon as I couldn't create instances, I wrote all of my objects in a stateless style (no instance variables) and registered them as singletons in the IoC container. This leads to all sorts of other code smells within that object. Since you can't promote variables to instance variables you end up passing all your state into methods, leading to methods with lots of parameters, which is a Clean Code smell. Methods passing parameters to other methods to other methods to other methods...

Custom "init" methods are introduced
To combat the last problem, you might add an initialize method to take the place of the constructor. This will work, though it's non-standard and confusing. You also have to decide how to register the class in the IoC container. I've seen this done with a singleton, where the Init method clears the old state and setups the new state WHICH IS EVIL. Or you can have it create a new instance each time, but you'll never know how it's configured when you're using it, more on that later.

Classes that could benefit from refactoring to have instance variables, don't get refactored
Both of the above contribute to this problem. When you add that next "context" parameter to your stateless class, it may be the straw to break the camels back, causing you to refactor those parameters to instance variables. But time and time again I've seen the uncertainty around the lifecycle of the class in IoC lead people to delay this refactoring. Again, more on the lifecycles later.

Factory interfaces are created to wrap the constructors
Finally, when I'm fed up with the above, I introduce a factory class who's sole purpose is to wrap the constructor of the class I actually want. This is just bloat, plain and simple. And it's yet another class that gets registered with the IoC container.

I also have some complaints about DI and IoC when they are leveraged simply to enable unit testing. I'd like to call into question some of the typical assumptions around DI/IoC, and we'll see where it takes me.

Interfaces are good because they make your code loosely coupled
This is the most common assumption I see that I disagree with. I've seen this in Fowler's writing on DI and in the GOOS book, which makes me seriously doubt myself here. But no amount of time or practice has changed my opinion. The argument is that it's good to use interfaces over ALL OF YOUR CLASSES because it allows you to make them ALL PLUGABLE.

I find this completely ridiculous! There are certainly classes that it is useful to have be "plugable" but they are few and far between. The majority of the classes that I create are Single Responsibility, extracted from other classes, and small. They serve a very pointed, very well understood, and very well defined purpose in the larger web of objects. The chances I would want to PLUGIN in a different version of this class are virtually zero (aside from for testing). I might change it as the design of my domain changes over time, and that will be easy because it's small and focused. But I'm not going to suddenly wish it was plugable! I might even argue that wrapping these things in interfaces hurts cohesion.

I should clarify, I do want to swap these classes out so I can test, and that's why I'm forced to use interfaces. But the argument that using interfaces for their own sake, even if I wasn't testing, for all classes is a good thing is one I just find outrageous. It's an unnecessary level of indirection, adding more complexity, and giving me nothing useful in return: YAGNI.

IoC is good because it makes the lifecycle of my objects someone else's problem
This is another argument I've frequently seen in favor of IoC. That moving the responsibility for the "lifecycle" of your objects all the way up to the top of your application is a good thing. This is another argument I find mind boggling. Again, there are a minority of cases where this is useful. An NHibernate session that needs to be created for each web request comes to mind.

But again, 90% of the time I've got an instance of an object that wants to delegate some task pertaining to its current state to some other object. I want that object to come into existence, perform some task for me, and go away. Basically, if I wasn't encombered by interfaces and DI and IoC, I'd new up the object passing in the necessary state and tell it to do what I need. The garbage collector would take care of the rest. But when I'm burdened by IoC, suddenly there is a lot of uncertainty around what the lifecycle of each object should be. And you can't tell what lifecycle a given injected object has. This is a significant issue that affects how you will use that object, and how that object is designed.

Dependency Injection is good because it makes my object's dependencies obvious
I've encountered this argument in many places too, notably this blog post. The argument roughly goes: "To unit test an object you need to know what it's dependencies are so you can mock them. But if those dependencies aren't clearly specified in the constructor, how will you know what they are?! So DI is awesome because it clearly specifies the dependencies!"

This is such nonsense it's really quite amusing. Knowing what the dependencies are doesn't even BEGIN to help me mock them out. I have to know what methods are called, with what parameters, expecting what return value, and how many times! In other words, I have to know everything about how the object uses that dependency to successfully and cleanly mock it out. A list of what dependencies are used barely scratches the surface and is definitely NOT a compelling reason to use Dependency Injection!

Another downside of IoC/DI that I've often run into is that it spreads like a virus through your code. Once you start using DI, every class ABOVE that class in the call stack also has to use DI. Unless you're OK with using the container as a service locator, which few people are. So essentially, if you're going to use DI you're going to have it in the majority of your classes.

I'm painfully aware that this post is just a lot of complaining and contains no useful suggestions toward correcting any of these issues. What I want is what Ruby has, but C# just doesn't support that. I may experiment with some forms of service locator, leveraging extension methods. Or I might try to create some sort of internally mockable reusable factory concept. I'll write it up if it leads anywhere, but I doubt it will.

So I guess the only point of this post is to vent some of my frustrations with DI, IoC, and interfaces and get feedback from all of you. Perhaps you have felt some of this pain too and can validate that I'm not totally crazy? Or maybe, and I sincerely hope this is true, I'm missing some fundamental concept or perspective on this and you can fill me in? Or might it even be possible that this could be a first step toward cleaning up some of these pain points, even if just in a small way?

Or you might read all of this and shrug it off. You're used to it. It's always been this way, it can't be any other way, go about your business. If that's your thinking as you're reading this sentence, I'd just encourage you to re-examine your assumptions about what's possible. Take a look at Bernhardt's testing in DAS. THEN come back and give me a piece of your mind.