Tuesday, May 29, 2012

Minor Issues: Query Results vs. Models

I want to take a look at a minor issue that crops up in a very common application structure where you have a list of data, possibly from a search, that the user selects from to view details.

There are some minor issues that must be addressed, and they all have to do with queries especially when we're dealing with SQL.  There will be a query that returns the list by gathering all the data, maybe doing some formatting, and joining to all the relevant tables.  For example, if it's a list of books it well return the title, publish date, author (join to author table; format name), and genre (join to genre table).

Apart from listing the books, the app also needs to be able to add new books.  This will work as follows:
  1. A dialog pops up with all the fields to fill-in
  2. On save, if everything validates, the book is saved in the database
  3. The new book is added to the list with AJAX (did I mention it's a web app?)
Since I don't want to leave you hanging, here are the "minor issues" I'm going to look at:
  • Query performance (N+1 Select)/Query complexity
  • Formatting logic
  • Type conversion
To illustrate my points, I'll use the Active Record pattern.  Using the book example, a naive implementation of the query might look like this:
var books = Books.All();
foreach(var book in books) {
  // display the data by accessing it this way:
  book.Title
  book.PublishedDate.ToString("MM/dd/yyyy")
  book.Author.FormattedName
  book.Genre.Name
}
Some things to note about this code:
  • It suffers from the N+1 Select problem because for each book it does a query to lazy load the author and another query to lazy load the Genre (technically that's N+2).
  • It formats the date with a .NET format string.
  • It formats the author name using the format logic built in to the Author class in the FormattedName property
The first is a serious issue that we *must* correct, but there isn't anything inherently wrong with the other two.  

Query performance/complexity
To fix the N+1 Select problem, eager loading could be applied.  Eager loading is a tool of ORMs that includes joins in your query an expands those into referenced objects without a separate database call.   Entity Framework, for example, as a nice method called Include so you could write .Include("Author").Include("Genre").  NHibernate allows you to define this as part of the mapping.

This solves the N+1 Select problem, and is generally good enough for a simple example.  But when the query is more complicated using the ORM to generate the SQL can be troublesome.  And it's worth pointing out that written this way, the SQL will return all the fields from all the rows it joined to and selected from, even if only a small subset is needed.  This may or may not affect performance, but it will impact the way indexes are defined.

The N+1 Select problem can also be solved by not using Books.All(), and instead writing a SQL query to do the necessary joins and come back with only the required data.  There are two clear benefits to this:
  1. Using SQL directly means there are no limits on what features of the database can be used.  Plus, the query can be optimized however needed.
  2. Only the required data fields need to be selected, instead of all the fields.  And data from more than one table can be returned in one select without fancy eager loading features.
To represent the results, a Query Result class can be defined.  This class will be very similar to the AR models, but only contain properties for the returned fields.  

Formatting Logic
But this is where those two other bullet points from earlier come into play.  Remember how the date was formatted with a .NET format string?  In a custom query, this can easily be moved into the query result object.  It's the formatting of the author name that is going to cause some trouble.

Pretend there are three columns that represent name: FirstName, MiddleName, LastName.  There are three choices for how to format this into a single name display:
  1. Put the formatting logic in the select statement of the SQL query (duplicates the logic on Author)
  2. Put the formatting logic in a property of the query result object (duplicates the logic on Author)
  3. Refactor Author and call it's method to format the name (awkward)
To explain, here's what Author might have looked like:
public class Author {
  ...
  public string FormattedName { get { return FirstName + " " + MiddleName + " " + LastName; } }
}
This formatting logic is coupled to the fields of the Author class, and so it can't be reused. To make it reusable, it could be refactored into a function that takes the fields as parameters. One way might look like:
public class Author {
  ...
  public string FormattedName { get { return FormatName(FirstName, MiddleName, LastName); } }
  public static string FormatName(string first, string middle, string last) {
    return first + " " + middle + " " + last;
  }
}
This is now in a format that could be used from within our query result object:
public class BookListResult {
  ...
  public string FormattedName { get { return Author.FormatName(FirstName, MiddleName, LastName); } }
}
Part of me loves this, and part of me hates it.

Type Conversion
The other issue that must be dealt with when using the Query Result approach, deals with the AJAX part of our scenario.  Remember how we wanted to add the book to the top of the list after the add?  Well our view that renders the list item is going to be typed to expect a BookListResult, which is what the query returns.  However, after the Add, the code will have a Book instance, not a BookListResult.  So this requires a way to convert a Book into a BookListResult.  I usually do this by adding a constructor to BookListResult that accepts a Book, and that constructor then "dots through" the book collecting all the data it needs.

From a certain perspective, this can be viewed as duplicating the query logic because knowledge of what fields the QueryResult's data comes from appears in two places: once in terms of the physical SQL tables in the SQL query, and again in terms of the Active Record objects.

Yet somehow I still prefer the Custom Query approach to the eager loading approach...  I just like to have that absolute control over the SQL query.  The cost of the boilerplate code here is worth it to me if it means I can directly leverage the query features of my database (like row number, and full text, and CTEs and pivots, etc etc).

As in the last "Minor Issues" post (constructors and MVC controllers), I'd love to hear your thoughts or experiences with these patterns.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.