Wednesday, May 30, 2012

Book: Windows Powershell in Action


Windows Powershell in ActionWindows Powershell in Action by Bruce Payette
My rating: 5 of 5 stars

One of the most enjoyable specific technology focused books I've ever read.  Usually books that teach you a language or a framework are pretty dry and uninspiring, but this one was great.  The examples used are good at illustrating the points without going overboard.  But by far my favorite part were the little asides where the author explains difficult design decisions the PowerShell team had to make.


View all my reviews

Tuesday, May 29, 2012

Minor Issues: Query Results vs. Models

I want to take a look at a minor issue that crops up in a very common application structure where you have a list of data, possibly from a search, that the user selects from to view details.

There are some minor issues that must be addressed, and they all have to do with queries especially when we're dealing with SQL.  There will be a query that returns the list by gathering all the data, maybe doing some formatting, and joining to all the relevant tables.  For example, if it's a list of books it well return the title, publish date, author (join to author table; format name), and genre (join to genre table).

Apart from listing the books, the app also needs to be able to add new books.  This will work as follows:
  1. A dialog pops up with all the fields to fill-in
  2. On save, if everything validates, the book is saved in the database
  3. The new book is added to the list with AJAX (did I mention it's a web app?)
Since I don't want to leave you hanging, here are the "minor issues" I'm going to look at:
  • Query performance (N+1 Select)/Query complexity
  • Formatting logic
  • Type conversion
To illustrate my points, I'll use the Active Record pattern.  Using the book example, a naive implementation of the query might look like this:
var books = Books.All();
foreach(var book in books) {
  // display the data by accessing it this way:
  book.Title
  book.PublishedDate.ToString("MM/dd/yyyy")
  book.Author.FormattedName
  book.Genre.Name
}
Some things to note about this code:
  • It suffers from the N+1 Select problem because for each book it does a query to lazy load the author and another query to lazy load the Genre (technically that's N+2).
  • It formats the date with a .NET format string.
  • It formats the author name using the format logic built in to the Author class in the FormattedName property
The first is a serious issue that we *must* correct, but there isn't anything inherently wrong with the other two.  

Query performance/complexity
To fix the N+1 Select problem, eager loading could be applied.  Eager loading is a tool of ORMs that includes joins in your query an expands those into referenced objects without a separate database call.   Entity Framework, for example, as a nice method called Include so you could write .Include("Author").Include("Genre").  NHibernate allows you to define this as part of the mapping.

This solves the N+1 Select problem, and is generally good enough for a simple example.  But when the query is more complicated using the ORM to generate the SQL can be troublesome.  And it's worth pointing out that written this way, the SQL will return all the fields from all the rows it joined to and selected from, even if only a small subset is needed.  This may or may not affect performance, but it will impact the way indexes are defined.

The N+1 Select problem can also be solved by not using Books.All(), and instead writing a SQL query to do the necessary joins and come back with only the required data.  There are two clear benefits to this:
  1. Using SQL directly means there are no limits on what features of the database can be used.  Plus, the query can be optimized however needed.
  2. Only the required data fields need to be selected, instead of all the fields.  And data from more than one table can be returned in one select without fancy eager loading features.
To represent the results, a Query Result class can be defined.  This class will be very similar to the AR models, but only contain properties for the returned fields.  

Formatting Logic
But this is where those two other bullet points from earlier come into play.  Remember how the date was formatted with a .NET format string?  In a custom query, this can easily be moved into the query result object.  It's the formatting of the author name that is going to cause some trouble.

Pretend there are three columns that represent name: FirstName, MiddleName, LastName.  There are three choices for how to format this into a single name display:
  1. Put the formatting logic in the select statement of the SQL query (duplicates the logic on Author)
  2. Put the formatting logic in a property of the query result object (duplicates the logic on Author)
  3. Refactor Author and call it's method to format the name (awkward)
To explain, here's what Author might have looked like:
public class Author {
  ...
  public string FormattedName { get { return FirstName + " " + MiddleName + " " + LastName; } }
}
This formatting logic is coupled to the fields of the Author class, and so it can't be reused. To make it reusable, it could be refactored into a function that takes the fields as parameters. One way might look like:
public class Author {
  ...
  public string FormattedName { get { return FormatName(FirstName, MiddleName, LastName); } }
  public static string FormatName(string first, string middle, string last) {
    return first + " " + middle + " " + last;
  }
}
This is now in a format that could be used from within our query result object:
public class BookListResult {
  ...
  public string FormattedName { get { return Author.FormatName(FirstName, MiddleName, LastName); } }
}
Part of me loves this, and part of me hates it.

Type Conversion
The other issue that must be dealt with when using the Query Result approach, deals with the AJAX part of our scenario.  Remember how we wanted to add the book to the top of the list after the add?  Well our view that renders the list item is going to be typed to expect a BookListResult, which is what the query returns.  However, after the Add, the code will have a Book instance, not a BookListResult.  So this requires a way to convert a Book into a BookListResult.  I usually do this by adding a constructor to BookListResult that accepts a Book, and that constructor then "dots through" the book collecting all the data it needs.

From a certain perspective, this can be viewed as duplicating the query logic because knowledge of what fields the QueryResult's data comes from appears in two places: once in terms of the physical SQL tables in the SQL query, and again in terms of the Active Record objects.

Yet somehow I still prefer the Custom Query approach to the eager loading approach...  I just like to have that absolute control over the SQL query.  The cost of the boilerplate code here is worth it to me if it means I can directly leverage the query features of my database (like row number, and full text, and CTEs and pivots, etc etc).

As in the last "Minor Issues" post (constructors and MVC controllers), I'd love to hear your thoughts or experiences with these patterns.

Thursday, May 24, 2012

Hg Bookmarks Made Me Sad

Branches
Hg's branches are "permanent and global", meaning they are fixed to the changeset, can't be deleted, and can't be left behind when you push changes on that branch.

This is in contrast to git's branches, which are temporary and are not part of the changesets.  I think of them as pointers.

It can be nice to have branch names on your commits, because it adds some meaningful context to the commits.  It makes understanding your history very easy.  The only downside that I am aware of is the potential for name collisions.  Someone might try to create a branch using a name that someone else had already used.  In which case you should really just use a different name...  If there are other downsides, I don't know what they are.

Workflow
However, it has always been the recommendation of the Mercurial team that branches be used for long lived branches, not short term topic branches.  Pre-2.0 they probably would have recommended using local clones, now they recommend using Bookmarks.  I've found local clones less than ideal for my workflow, which typically looks like this:
  1. Get latest changes
  2. Decide what task to work on next
  3. Create a branch-line to work on that task
  4. Hack hack hack
  5. If it takes awhile (>8hrs), merge default into the branch-line
  6. If I have to go home for the night and I'm not done, push branch to remote as backup
  7. Push branch-line to remote, have someone do a code review
  8. Land branch-line on default, close branch-line
Some things to note about this:
  • The branches are small and generally short lived, one per "topic"
  • I want to push them to the remote for backup purposes (in case my computer fries over night or something)
  • I want to push them to remote so others can collaborate
This is why named-branches are so much more convenient for me than local clones.

However, in a recent release of Mercurial, they added a new notification message when you create a branch which says: "(branches are permanent and global, did you want a bookmark?)"  So they couldn't be much more clear and in my face about the fact they think I should be using bookmarks for my workflow instead of named-branches.

Bookmarks
Bookmarks are basically Mercurial's version of git's temporary short lived branches.  It means I'll lose the nice branch names on my commits in history.  But I wont have to worry about name conflicts.  This already doesn't seem like a worthwhile trade, but I'm willing to take the Mercurial dev's word for it and try it out.  Sadly I found them, in their current state (2.2.1), to be bug prone and impractical.  For the remainder of this post, I'd like to explain what I don't like about them as they are now.  But since I don't want to have to explain the basics here, you should go read about them first: http://mercurial.selenic.com/wiki/Bookmarks.  I'd like to throw this one little caveat in before I start, which is to say that it's totally possible I am miss using these things.  I sincerely hope that's the case and someone will point out a better way to me.  But I couldn't find any good real workflow examples of bookmarks, so I had to figure it out on my own.

Must create my own 'master' bookmark, everyone on the team must use this bookmark
When I create my first bookmark and commit on it, I've just created two heads on the default branch. I can easily find the head of my topic branch, it has a bookmark, but how do I find the head of the mainline?

Worse, say I publish the bookmark to the remote server and you do a pull -u.  You will be updated to my topic bookmark, because it's the tip.  That is NOT what either of us wanted.  I created a topic branch because I didn't want it to get in your way.  In fact, you shouldn't have to be aware of my branch at all!

So bookmarks are broken before we even get out of the gate.  The work around is to create a 'master' bookmark that points at the mainline head.  Everyone on the team will have to aware of this bookmark, and they'll have to be careful to always update to it.

Must merge 'master' with 'master@1' and bookmark -d 'master@1'
The next problem happens when you and I have both advanced the master bookmark.  In the simplest case, maybe we both just added a new changeset directly on master.  Lets say you push first, and I pull.  If we weren't using bookmarks, hg would notify me when I pulled that there were multiple heads on my branch and it would suggest I do a merge.  So I'd merge your update with my update and be back to only one head on the branch.

With bookmarks, it's more confusing.  Hg will notify me that it detected a divergent bookmark, and it will rename your master bookmark to master@1 and leave it where it was.  It will leave mine named master and leave it where it was.  Now I have to "hg merge master@1; hg bookmark -d master@1;"

As a side note here, I was curious how git handles this problem, since git's branches are implemented so similarly to hg's bookmarks.  The core difference is that git wont let you pull in divergent changes from a remote into your branch without doing a merge.  It's conceptually similar to renaming the bookmark to master@1, since what git technically does is pull the changes into a "remote tracking branch" (that's a simplification, but close enough), and then merge that remote tracking branch onto your branch.  But it has a totally different feel when you're actually using it.

Can't hg push, or it will push my changes without my bookmark
This is the most devastating issue.  If I have created a new topic bookmark and committed on it, and then I do "hg push", it's going to push my changes to the remote without my bookmark!  The bookmarks only get pushed when you explicitly push them with "hg push -B topic".  Which means if I'm using bookmarks, I can't ever use the hg push command without arguments, or I'm going to totally confuse everyone else on the team with all these anonymous heads.

It's true that as long as the team is using the master bookmark and their own topic bookmarks, they shouldn't really have any problems here...  But it's still totally confusing, and totally not what I wanted.

Suggestions
The Mercurial team feels very very very strongly about maintaining backwards compatibility.  So it's probably a pipe dream to hope that this might change.  But I have two suggestions on how these problems might be mitigated.  These suggestions probably suck, but here they are anyway.

Hg up should prefer heads without bookmarks
If I do a pull -u and it brings down a new head, but that head has a bookmark, hg up should update to the head WITHOUT the bookmark.  This would allow me to use bookmarks without them getting in the way of other members of the team.

I think it would also allow me to not have to create the 'master' bookmark.  When I wanted to land a topic bookmark, I would just do: "hg up default; hg merge topic; hg ci -m "merged topic";"  Since "default" is the branch name, hg would prefer the head without bookmarks, which would be the mainline.

Hg push should warn if pushing a head with a bookmark
This would be consistent with hg's treatment of branches.  When you hg push, if you have a new branch, it aborts and warns you that you're about to publish a new branch.  You have to do hg push --new-branch.  I think it should do the same thing for bookmarks.  This would prevent me from accidentally publishing my topic bookmarks.

I <3 Hg
I really like Mercurial.  Even in the hg vs. git battle, I tend to prefer hg.  I love how intuitive it's commands are, I love how awesome it's help is, I love it's "everything is just a changeset in the DAG" model (vs. git's "you can only see one branch at a time, what's a DAG?" model).  And that's why bookmarks are making me sad.  Every time I create a branch, hg tells me I'm doing it wrong, but bookmarks are way too unfriendly right now (unless I'm missing something huge [it wouldn't be the first time]).

I still strongly recommend Hg.  If you're still using CVS, or Subversion, or heaven help you TFS, you should take a look at Mercurial.

And if you're a Mercurial expert (or a Mercurial developer!) please help me understand how to use bookmarks correctly!

PS.  I thought about drawing little graph pictures to help explain the issues I laid out here, but I don't have a decent drawing tool at my disposal, and I didn't think this rant really deserved anymore time than I already put in.  Hopefully you were able to make sense out of all these words.

Monday, May 21, 2012

Minor Issues: Constructors and MVC Controllers

Recently I've been getting into F# a little.  It's a really cool language which has been broadening my perspective on problem solving with code.  It's a .NET language, and is mostly compatible with C#, but it does do some things differently.  For example, it has more rigorous rules around constructors:
type Point(x : float, y : float) =

  member this.X = x
  member this.Y = y

  new() = new Point(0.0, 0.0)

  new(text : string) =
    let parts = text.Split([|','|])
    let x = Double.Parse(parts.[0])
    let y = Double.Parse(parts.[1])
    new Point(x, y)
It may not immediately jump out at you, but there are some really cool things here that C# doesn't do:
  1. This actually defines 3 constructors, the "main" constructor is implicitly defined to take in floats x and y
  2. Constructors can call other constructors!  Note the empty constructor, new().
  3. All constructors ARE REQUIRED to call the "main" constructor
I fell in love with this immediately.  This requirement forces me to be keenly aware of what my class's core data really is.  And it communicates that knowledge very clearly to any consumers of the class as well.  This was especially refreshing because I have found that since C# added object initialization syntax (new Point { X = 1.0, Y = 2.0 }) I've started writing far fewer constructors.  Constructors are boilerplate and annoying to type, so I largely stopped typing them.  But now that I have done that for awhile and I have a few real world examples of classes without constructors, I find that I miss the constructors.  They communicate something nice about the core, and most important data, of the class that a whole lot of properties doesn't communicate.

So, that sounds pretty straight forward, and I should start writing constructors on my classes again.  And if I want to be really strict (which I kind of do), I shouldn't provide a default constructor either.  Then I'll be living in the rigorous world of class constructors, just like F#.

But this is where MVC Controllers finally make their first appearance in this post.  Because these controllers exert there own pressure on my classes and down right require a default (parameter-less) constructor.  At least that's the case with the way my team writes our controllers.  Why?  Here's an example.

Let's talk about CRUD.  Typically there's a controller action we call "New", it returns an empty form so you can create a new record.  This form posts to a controller action we call "Create", which binds the form data to a model, and calls .Save().  We're using MVC's standard form builder helpers, which generate form field names and IDs based on the model expression you provide as a lambda.  This is how it knows how to bind the form back to your model on "Create".  But this means you have to pass an empty model out the door in the "New" to generate the empty form.  An empty model requires a default constructor!  So the code looks like this:
public ActionResult New()
{
  ViewData.Model = new Point();
  return View();
}

[HttpPost]
public ActionResult Create(Point point)
{
  point.Save();
  return View();
}
Obviously, real code is more complicated than that, but you get the idea.

And so I find myself with a minor issue on my hands.  On the one hand, I want to create F# inspired rigorous classes.  But on the other hand I want simple controllers that can just send the Model out the door to the view.  Alas, I can't have both, something has to give.

Obviously I could give up on the Constructors.  Or, I could give up on passing my model directly to the view.  There are other approaches well documented in this View Model Patterns post.  The quick description is I could NOT pass my model, and instead pass a View Model that looks just like my model.  And then I'd have to map that View Model back onto my Model somehow...  But that comes with it's own minor issues.

So how about it?  How do you deal with this issue?

Monday, May 14, 2012

Powershell Listing SQL Table Columns

Powershell has an awesome utility called sqlps that both lets you execute sql statements against a database, and implements a file system provider over SQL.

One of the things I use this for all the time is to inspect the columns of a table.  Management Studio's tree view is terrible for this, especially compared to the flexibility of powershell which allows you to do things like:

  1. Sort the columns by name, or by data type
  2. Filter the columns by name, or by data type, or by nullable, etc
Here's a series of commands I use a lot that I thought was worth sharing:
  1. sqlps
  2. cd sql\localhost\default\databases\mydatabase\tables\schema.table\columns
  3. ls | where { $_.Name -notlike 'ColPrefix*' } |  select Name, @{Name="Type"; Expression={"$($_.DataType.Name)($($_.DataType.MaximumLength))"}}, Nullable
That will display all the columns that DO NOT have a name starting with ColPrefix and will show you each columns Name, Data Type (formatted like "nvarchar(255)"), and whether it allows nulls.

Enjoy!

Tuesday, May 8, 2012

Selfish Programmers: less flame-baity

Last post too flame-baity for you?
 Fair enough!

It's far too easy to confuse "easy" with "simple."  Rich Hickey touches on this a bit in this presentation, which is very similar to another talk he gave that I blogged about earlier.  Almost every thing he said in these talks was very thought provoking for me.  But the one that really hit home the hardest was this concept of easy vs. simple.

The difference between easy and simple is rather hard to firmly pin down.  One way to think of it might be that easy means less effort.  Fewer keystrokes, fewer concepts.  The less I have to type, the easier it is to do something.  The less I have to know, the easier it is.  The fewer structures between me and what I'm trying to accomplish, the easier.

But easier doesn't necessarily mean simpler.  Hickey associates simpler with un-twisted.  So code that is DRY, and SOLID would be simple.  Even if it requires more keystrokes, classes, and curly braces to write.

I find myself falling for this a lot.  Sometimes it might be simpler to do more work, but that's hard to see.  On the other hand, it's incredibly easy for me to judge how fun something will be for me to do, or how much tedious effort something with require.

The problem is that EASY is about me, where SIMPLE is about my code.  So the deck is stacked against us as software developers.  It's going to be difficult to separate whats easy for us from what's simple for our code and make the right design decision.

Being aware of this distinction is useful.  And I certainly wasn't as aware of it before watching Hickey's talk.  But it does raise an interesting question of how can we keep ourselves honest?  How can we notice when we're doing the easy thing instead of the simple thing?  While at the same time avoiding doing too much and over complicating?

Monday, May 7, 2012

Selfish Programmers

The biggest movement in software today is selfishness.

Ok, I've only been here for a short time, so what do I know, maybe it's always been like this.  And people being selfish doesn't really constitute "a movement" (though I wouldn't be surprised if some people would be willing to argue that our generation is a very selfish one, I'm not sure how you would prove that we're more selfish than previous generations were at our age...).

What DOES constitute a movement is the continuous push toward tools that make a programmer's job "easier."

Yeah, you got that right, I'm about to take a stance against making things easier.  Here's some examples of tech stuff that's supposed to "makes things easier":
  • Visual Studio
    • WinForms/WPF designers
    • WCF
    • Solutions and project files
    • Entity Framework (database first)
    • Linq-to-Sql
  • Active Record
  • DataMapper (ORM)
  • Convention over instead of configuration (I'm looking at you Rails)
  • ASP.NET MVC's UpdateModel (and Validations)
  • All Ruby code ever written
This stuff is "easier" because it requires "less work".  Usually boring tedious work.  No one likes boring tedious work (except on Monday morning when you're really tired).  So naturally we try to get rid of that work.  There are different strategies for getting rid of it.  Microsoft likes to get rid of it by replacing it with drag and drop and magic tools that hide the details of what's going on so you don't have to learn anything.  Ruby on the other hand puts extreme focus on minimalism and pretty syntax, favoring as few keystrokes as possible.

But do we think about what that's costing us, or costing our future selves?  Nope!  We're selfish sons of bitches and all we care about is doing stuff that we enjoy and think is "elegant" with as few keystrokes and effort as possible!

We like drag and drop and magic tools because it saves all that time learning things and typing.  Unfortunately, it also dramatically reduces flexibility, so as soon as you step outside the boundary of the demo application, you find yourself stuck.  Now the complexity of your solution skyrockets as you hack around the limitations of the drag and drop and magic.

And we like minimalism, cause it feels like simplicity.  Our model not only looks exactly like the database, but it's automatically created by reflecting over the database, and then we send those models directly to the UI and mash them into our views!  ITS SIMPLE!  Well, it's less code, but is it simple?  You've intimately twisted the knowledge your database through your entire code base, leaving you with the pressures of physical storage, UI layout, and controllers all tugging on the same objects.  Every time you remove a layer or a concept from your code to make it simpler, you run the risk of tangling concepts and paying for it in the long run (Of course, every time you add a layer or a concept you run the risk of over abstracting, it's a balance).

In conclusion: stop being so selfish!  Sometimes the best way to do something isn't the most fun, or elegant, or easy way.  Toughen up!  Do the work!

Dogma-less TDD and Agile


TDD is about rapid feedback of the code to the developer.  Agile is about rapid feedback of the application to the development team.

Everything else is just BS.

Here are some of the things that fall into the BS category:
  • Up front design
  • "Architecture"
  • Project plans
  • Estimates/Story Points
  • Information Radiators
  • Team Velocity
  • Specifications
  • Code Reviews
  • QA
  • Approval processes
  • 100% Test Coverage
It's not that these things don't have a purpose, or aren't useful.  But they are all afflicted with varying degrees of BS (baseless guessing, built in uncertainty, outright lying, and occasionally even complete denial of reality).  

What most of these things have in common is team organization.  A one man team doesn't need this stuff. But any more than one person, and you require some way of keeping everyone on the same page.  Especially if you are building software that all of the teammates don't completely understand.  Without some kind of organization, people would be chasing their own ideas in all different directions.  And since they don't fully understand the "business," those ideas are likely to be wrong (or at least partly wrong).

Thus, teams need a certain amount of BS.  But I think it's important to remember the distinction.  The most important thing to delivering real value is feedback.  Feedback in code.  And feedback in features.  You need the BS, but apply it carefully, and try to keep the BS out of your feedback loops!