Monday, March 26, 2012

Are You A Hacker Or A Craftsman?

It's usually viewed as an either/or type of thing: Either you're a hacker, or you're a craftsman.  The problem with any discussion around this is that the words have no clear and fixed definition.  They mean something different to everyone.  So conversations about this end up largely unproductive as everyone keeps talking past one another.

Lets start here.  Hackers are generally characterized as rebels quickly banging out some code that does something impressive, but the code is unmaintainable crap that no one else can understand.  And the hacker is generally only concerned with short term goals.  On the other hand, craftsman are characterized as deliberately and carefully designing beautiful code, but it takes them forever and they're encombered by things like tests and principles and patterns.  And the craftsman is usually looking ahead to the long term (at least where maintenance is concerned).

I don't think these either/or characterizations are useful.  Here's a completely different take on "Hacker" from Rands in Repose talking about Facebook.  He characterizes hackers as "believing something can always be better", and not accepting limitations preventing them from making it better.  In the positive light, these "hackers" reject restrictive process and seek to be disruptive by doing new and better things.  In a negative light, these "hackers" reject collaboration, are unreasonable, unpredictable, and not motivated by the same goals as the business.

This is not even close to being the same as the colloquial meaning of "hacker," but its an interesting blend of hacker and craftsman.  It has the hacker's rebellious qualities, combined with the craftsman's broader vision.

And it's here that I think there is an interesting point to be made.  Hackers lose points for not caring about the future, or the long term.  And craftsmen lose points for losing sight of the broader objectives due to their uncompromising attention to code details.

Software development is nothing if not compromise.  The team that shits out awful code quickly gets done first*.  But then can't respond to change.  The perfectionists with 100% code coverage and perfect SOLID code come in last*.  But then can't effectively respond to change.

* Yes, asterisks.  The team that writes the awful code probably wont finish first, because they spend most of their time debugging.  And it's possible the team with 100% code coverage wont finish last, at least that's the argument, which you can believe if you want to (but I think it largely depends on your experience and your tooling).

I think it's pretty clear that there is a time for hacking, and there is a time for craftsmanship.  The real trick is figuring out at any given moment, which mindset you should be in.  And that requires a very broad understanding of the factors of the project, including: the vision of the product, the long term strategy, the short term goals, the complexity of the code, and the likelihood of change.  All of which is subject to change.  So good luck!!

Monday, March 19, 2012

Rebasing Is For Liars

Rebasing is very popular in the Git community.  Of course, I'm primarily a Mercurial guy, but we have rebasing too.  It's a built in extension, you just have to turn it on.

What is rebasing?  The typical scenario goes something like this:
  1. You make changes and commit one or more changesets
  2. Meanwhile, other people have committed changes
  3. You pull down those changes
  4. But instead of merging, you rebase
  5. Which detaches your changes from history and reapplies them after the changes you pulled in
People like this because it keeps the history linear, avoiding "merge bubbles."  And certainly linear history is much easier to understand.  

But I have a problem with rebasing: it's lying.  Understanding the context that changes were made in can be very useful, but rebasing rewrites the history, changing the parent pointers, and thereby changing the context.  Lying about what the code looked like when you changed it.

That said, I still use rebase.  But only when my changes are small or inconsequential and I know that the consequences of lying about what the code looked like when I made those changes wont matter at all.  And in those cases, it's nice to reorder the history to be sequential because it does limit the conceptual overhead of understanding those kinds of changes.  But in general, I prefer to see the merges simply because it accurately represents what really happened.

Monday, March 12, 2012

Simple Made Easy


Simple Made Easy
"Rich Hickey emphasizes simplicity’s virtues over easiness’, showing that while many choose easiness they may end up with complexity, and the better way is to choose easiness along the simplicity path."

I absolutely recommend you take the hour to watch this presentation.  It's pretty easy viewing, he's funny, and I found it very influential.

Highlights
"Your ability to reason about your program is critical to changing it without fear."  This has been something I've firmly believed for a very long time, but I love how succinctly Hickey puts it here.  He even has the courage to challenge the two most popular practices of Software Engineering today: Agile, and TDD.  For Agile, he's got this line: "Agile and XP have shown that refactoring and tests allow us to make change with zero impact.  I never knew that, I still do not know that."  Agile is supposed to make the fact of change one of the primary motivators behind how the project is run, but it doesn't really make applying that change any easier in the code...  For TDD he has this wonderful quip:
"I can make changes 'cause I have tests!  Who does that?!  Who drives their car around banging against the guard rails saying, "Whoa!  I'm glad I've got these guard rails!"
He calls it guard rail programming.  It's a useful reminder that while tests are definitely valuable, they can't replace design and thoughtful coding.

Another very enlightening comment he made had to do with the difference between enjoyable-to-write code and a good program.  This rang very true with me, probably because of all the Ruby bigots these days who are obsessed with succinct or "beautiful" code, but are still writing big balls of mud.  Hickey basically said he doesn't care about how good of a time you had writing the program.  He cares about if it's complexity yields the right solution, and can be reasoned about/maintained.

Which leads to another concept he brings up of Incidental Complexity vs. Problem Complexity.  The argument that the tools you choose to use in your software can bring along extra complexity that has nothing whatsoever to do with the actual problem your program is supposed to solve.

Hickey Says I'm Wrong
I just wrote a series of posts where I was attempting to question some of the assumptions behind many of what are commonly considered good design practices in static object-oriented languages today:
  1. Interfaces, DI, and IoC are the Devil
  2. Demonstrating the Costs of DI
  3. Objects or Data Structures
  4. Service Layers or OOP
  5. Header Interfaces or Inverted Interfaces
  6. DIP: Loose or Leaky?
  7. Abstraction: Blessing or Curse?
I covered alot of stuff in that series.  One of the things I was really challenging is the practice of hiding every object behind an interface.  I argued this indirection just made things more complicated.  At about 50 minutes in, Rich Hickey says every object should only depend on abstractions (interfaces) and values.  To depend on a concrete instance is to intertwine the "What" with the "How" he says.  So, he's saying I'm wrong.

I also talked about how Dependency Injection is leaky and annoying.  But Rich Hickey says you want to "build up components from subcomponents in a direct-injection style, you want to, as much as possible, take them as arguments", and you should have more subcomponents than you probably have right now.  So, yeah, I'm wrong.

I didn't actually blog about this one, but I've certainly talked about it with alot of people.  I've been a proponent of "service layers" because I want my code to be as direct as possible.  I want to be able to go one place, and read one code file, and understand what my system does.  For example if I send an email when you create a task, I want to see that right there in the code.  But Hickey says it's bad to have object A call to object B when it finishes something and wants object B to start.  He says you should put a queue between them.  So, wrong again!

I'm also a proponent of Acceptance Test Driven Development (ATDD) and writing english specs that actually test the system.  Hickey says that's just silly, and recommends using a rules engine outside your system.  :(

And finally, and this is the biggest one, he says: 
"Information IS simple.  The only thing you can possible do with information is RUIN it!  Don't do it!  We got objects, made to encapsulate IO devices.  All they're good for is encapsulating objects: screens and mice.  They were never supposed to be applied to information!  And when you apply them to information, it's just wrong.  And it's wrong because it's complex.  If you leave data alone, you can build things once that manipulate data, and you can reuse them all over the place and you know they are right.  Please start using maps and sets directly."
Um, yeah, ouch.  I'm an object oriented developer.  I read DDD and POEAA three years ago and got really excited about representing all my information as objects!  We extensively prototyped data access layers, Entity Framework and NH chief among them.  We settled on NH.  Worked with it for awhile but found it too heavy handed.  It hid too much of SQL and clung too much to persistence ignorance.  But I couldn't really understand how to use a Micro-ORM like Massive (or Dapper or PetaPoco) because I was too hung up on the idea of Domain Objects.  So we spiked an ORMish thing that used Massive under the covers.  It supported inheritance and components and relationships via an ActiveRecord API.  It gave us the flexibility to build the unit testing I always wanted (which I recently blogged about).  It is still working quite well.  But it's information represented as objects.  So it's wrong...

In case you didn't pick up on it, Rich Hickey wrote Clojure, a functional language.  I don't know anything about functional programming.  I've been meaning to learn some F#, but haven't gotten that into it yet.  So it doesn't really surprise me that Hickey would think everything I think is wrong.  Functional vs. OOP is one of the biggest (and longest running) debates in our industry.  I think it is telling that I've felt enough pain to blog about lots of the things that Hickey is talking about.  But I don't find it disheartening that his conclusions are different than mine.  It is possible that he is right and I am wrong.  It is also possible that we are solving different problems with different tools with different risks and vectors of change and different complexities.  Or, maybe I really should get rid of all my active record objects and just pass dictionaries around!

In any case, this certainly was a very eye opening presentation.

Monday, March 5, 2012

Database Seed Data

Almost two years ago, we started a major new development effort at Pointe Blank.  We had the extreme fortune to be starting some new projects on a completely new platform (namely ASP.NET MVC instead of WinForms), which gave us the opportunity to take all the things we'd learned and start from scratch.

One of the main things we had learned was the importance of designing for automation.  Probably the single most valuable automation related thing we did was automate the scripting of our database.  We are using FluentMigrator to define migrations and psake with sqlcmd to create and drop database instances.  I'm not all that crazy about FluentMigrator (its syntax is overly verbose, it's code seems overly complex) but it has worked very well for us so far.

We use this tool in a few different circumstances:

  1. To apply other people's changes to our local dev databases
  2. To migrate our QA environment
  3. To spin up from scratch (and tear down) automated test environments
  4. To deploy database changes to production
Database migrations have made working with our code base so dramatically easy, it's really amazing.  You can spin up a new environment with one command in seconds.  Incorporate other devs changes without a second thought.

Migration tools pay most of their attention to schema changes, but data gets largely ignored.  So much so that we had to rough in our own approach to dealing with data, which works, but also sucks.  And there doesn't seem to be any clear recommendations for data.  There are some categories of data I'm concerned with:
  1. List Data: these are values that are usually displayed in drop downs and are either not user generated, or there is a default set we want to seed (ex: states, name suffixes, record types, etc)
  2. Configuration Data: this is data that is required for the system to work, (ex: server uris, an email address to send from, labels to display in the app, etc)
There are only two approaches to dealing with seed that I'm aware of:
  1. Run the schema migrations, then insert all the seed data
  2. Insert the seed data inside the migrations
At first blush #1 seemed easier, so it's the approach we originally took, but it has some drawbacks. The first challenge is to avoid duplicates.  You can do that with IF NOT EXISTS(...) statements, or by inserting Primary Keys and ignoring PK violation errors.  The second is dealing with schema changes to tables with seed data.

For example, suppose you write a migration to create a table, and you write insert statements to add seed data to it.  You have already run your migrations and seeded the database.  Now you write a new migration which adds a new not null column to the table.  This migration will fail because the table already has data from the seeds and the new column requires a value.  In this situation, you're hosed.  You have to write the migration to add data into that column before making the column not allow nulls (or use a default value).  Effectively you have to duplicate the seed data for that column in both the migration and the seed data file.  You need it in both places in order to support creating the database from scratch and migrating an existing database.

#2 seems to have it's own downsides too.  There is no one place that shows you all the data that will be seeded in a given table, it's spread throughout the migrations.  It precludes you from having a schema.rb type of system (unless that system could somehow be smart enough to include data).  That point is somewhat academic at this point though, because FluentMigrator doesn't have anything like this.

However, #2 is much more flexible and can accommodate anything you could ever dream of doing with data (that SQL would support of course).  And I feel that pure #2 would be better than a #1/#2 hybrid, because it's just more straight forward.

And now I'd like to ask you, do you know of a good way of dealing with seed data?  Are there any existing tools or strategies out there that you could link me to?