kwblog

Monday, March 26, 2012

Are You A Hacker Or A Craftsman?

It's usually viewed as an either/or type of thing: Either you're a hacker, or you're a craftsman. The problem with any discussion around this is that the words have no clear and fixed definition. They mean something different to everyone. So conversations about this end up largely unproductive as everyone keeps talking past one another.

Lets start here. Hackers are generally characterized as rebels quickly banging out some code that does something impressive, but the code is unmaintainable crap that no one else can understand. And the hacker is generally only concerned with short term goals. On the other hand, craftsman are characterized as deliberately and carefully designing beautiful code, but it takes them forever and they're encombered by things like tests and principles and patterns. And the craftsman is usually looking ahead to the long term (at least where maintenance is concerned).

I don't think these either/or characterizations are useful. Here's a completely different take on "Hacker" from Rands in Repose talking about Facebook. He characterizes hackers as "believing something can always be better", and not accepting limitations preventing them from making it better. In the positive light, these "hackers" reject restrictive process and seek to be disruptive by doing new and better things. In a negative light, these "hackers" reject collaboration, are unreasonable, unpredictable, and not motivated by the same goals as the business.

This is not even close to being the same as the colloquial meaning of "hacker," but its an interesting blend of hacker and craftsman. It has the hacker's rebellious qualities, combined with the craftsman's broader vision.

And it's here that I think there is an interesting point to be made. Hackers lose points for not caring about the future, or the long term. And craftsmen lose points for losing sight of the broader objectives due to their uncompromising attention to code details.

Software development is nothing if not compromise. The team that shits out awful code quickly gets done first*. But then can't respond to change. The perfectionists with 100% code coverage and perfect SOLID code come in last*. But then can't effectively respond to change.

* Yes, asterisks. The team that writes the awful code probably wont finish first, because they spend most of their time debugging. And it's possible the team with 100% code coverage wont finish last, at least that's the argument, which you can believe if you want to (but I think it largely depends on your experience and your tooling).

I think it's pretty clear that there is a time for hacking, and there is a time for craftsmanship. The real trick is figuring out at any given moment, which mindset you should be in. And that requires a very broad understanding of the factors of the project, including: the vision of the product, the long term strategy, the short term goals, the complexity of the code, and the likelihood of change. All of which is subject to change. So good luck!!

Monday, March 19, 2012

Rebasing Is For Liars

Rebasing is very popular in the Git community. Of course, I'm primarily a Mercurial guy, but we have rebasing too. It's a built in extension, you just have to turn it on.

What is rebasing? The typical scenario goes something like this:

You make changes and commit one or more changesets
Meanwhile, other people have committed changes
You pull down those changes
But instead of merging, you rebase
Which detaches your changes from history and reapplies them after the changes you pulled in

People like this because it keeps the history linear, avoiding "merge bubbles." And certainly linear history is much easier to understand.

But I have a problem with rebasing: it's lying. Understanding the context that changes were made in can be very useful, but rebasing rewrites the history, changing the parent pointers, and thereby changing the context. Lying about what the code looked like when you changed it.

That said, I still use rebase. But only when my changes are small or inconsequential and I know that the consequences of lying about what the code looked like when I made those changes wont matter at all. And in those cases, it's nice to reorder the history to be sequential because it does limit the conceptual overhead of understanding those kinds of changes. But in general, I prefer to see the merges simply because it accurately represents what really happened.

Monday, March 12, 2012

Simple Made Easy

Simple Made Easy
"Rich Hickey emphasizes simplicity’s virtues over easiness’, showing that while many choose easiness they may end up with complexity, and the better way is to choose easiness along the simplicity path."

I absolutely recommend you take the hour to watch this presentation. It's pretty easy viewing, he's funny, and I found it very influential.

Highlights
"Your ability to reason about your program is critical to changing it without fear." This has been something I've firmly believed for a very long time, but I love how succinctly Hickey puts it here. He even has the courage to challenge the two most popular practices of Software Engineering today: Agile, and TDD. For Agile, he's got this line: "Agile and XP have shown that refactoring and tests allow us to make change with zero impact. I never knew that, I still do not know that." Agile is supposed to make the fact of change one of the primary motivators behind how the project is run, but it doesn't really make applying that change any easier in the code... For TDD he has this wonderful quip:

"I can make changes 'cause I have tests! Who does that?! Who drives their car around banging against the guard rails saying, "Whoa! I'm glad I've got these guard rails!"

He calls it guard rail programming. It's a useful reminder that while tests are definitely valuable, they can't replace design and thoughtful coding.

Another very enlightening comment he made had to do with the difference between enjoyable-to-write code and a good program. This rang very true with me, probably because of all the Ruby bigots these days who are obsessed with succinct or "beautiful" code, but are still writing big balls of mud. Hickey basically said he doesn't care about how good of a time you had writing the program. He cares about if it's complexity yields the right solution, and can be reasoned about/maintained.

Which leads to another concept he brings up of Incidental Complexity vs. Problem Complexity. The argument that the tools you choose to use in your software can bring along extra complexity that has nothing whatsoever to do with the actual problem your program is supposed to solve.

Hickey Says I'm Wrong
I just wrote a series of posts where I was attempting to question some of the assumptions behind many of what are commonly considered good design practices in static object-oriented languages today:

I covered alot of stuff in that series. One of the things I was really challenging is the practice of hiding every object behind an interface. I argued this indirection just made things more complicated. At about 50 minutes in, Rich Hickey says every object should only depend on abstractions (interfaces) and values. To depend on a concrete instance is to intertwine the "What" with the "How" he says. So, he's saying I'm wrong.

I also talked about how Dependency Injection is leaky and annoying. But Rich Hickey says you want to "build up components from subcomponents in a direct-injection style, you want to, as much as possible, take them as arguments", and you should have more subcomponents than you probably have right now. So, yeah, I'm wrong.

I didn't actually blog about this one, but I've certainly talked about it with alot of people. I've been a proponent of "service layers" because I want my code to be as direct as possible. I want to be able to go one place, and read one code file, and understand what my system does. For example if I send an email when you create a task, I want to see that right there in the code. But Hickey says it's bad to have object A call to object B when it finishes something and wants object B to start. He says you should put a queue between them. So, wrong again!

I'm also a proponent of Acceptance Test Driven Development (ATDD) and writing english specs that actually test the system. Hickey says that's just silly, and recommends using a rules engine outside your system. :(

And finally, and this is the biggest one, he says:

"Information IS simple. The only thing you can possible do with information is RUIN it! Don't do it! We got objects, made to encapsulate IO devices. All they're good for is encapsulating objects: screens and mice. They were never supposed to be applied to information! And when you apply them to information, it's just wrong. And it's wrong because it's complex. If you leave data alone, you can build things once that manipulate data, and you can reuse them all over the place and you know they are right. Please start using maps and sets directly."

Um, yeah, ouch. I'm an object oriented developer. I read DDD and POEAA three years ago and got really excited about representing all my information as objects! We extensively prototyped data access layers, Entity Framework and NH chief among them. We settled on NH. Worked with it for awhile but found it too heavy handed. It hid too much of SQL and clung too much to persistence ignorance. But I couldn't really understand how to use a Micro-ORM like Massive (or Dapper or PetaPoco) because I was too hung up on the idea of Domain Objects. So we spiked an ORMish thing that used Massive under the covers. It supported inheritance and components and relationships via an ActiveRecord API. It gave us the flexibility to build the unit testing I always wanted (which I recently blogged about). It is still working quite well. But it's information represented as objects. So it's wrong...

In case you didn't pick up on it, Rich Hickey wrote Clojure, a functional language. I don't know anything about functional programming. I've been meaning to learn some F#, but haven't gotten that into it yet. So it doesn't really surprise me that Hickey would think everything I think is wrong. Functional vs. OOP is one of the biggest (and longest running) debates in our industry. I think it is telling that I've felt enough pain to blog about lots of the things that Hickey is talking about. But I don't find it disheartening that his conclusions are different than mine. It is possible that he is right and I am wrong. It is also possible that we are solving different problems with different tools with different risks and vectors of change and different complexities. Or, maybe I really should get rid of all my active record objects and just pass dictionaries around!

In any case, this certainly was a very eye opening presentation.

Monday, March 5, 2012

Database Seed Data

Almost two years ago, we started a major new development effort at Pointe Blank. We had the extreme fortune to be starting some new projects on a completely new platform (namely ASP.NET MVC instead of WinForms), which gave us the opportunity to take all the things we'd learned and start from scratch.

One of the main things we had learned was the importance of designing for automation. Probably the single most valuable automation related thing we did was automate the scripting of our database. We are using FluentMigrator to define migrations and psake with sqlcmd to create and drop database instances. I'm not all that crazy about FluentMigrator (its syntax is overly verbose, it's code seems overly complex) but it has worked very well for us so far.

We use this tool in a few different circumstances:

To apply other people's changes to our local dev databases
To migrate our QA environment
To spin up from scratch (and tear down) automated test environments
To deploy database changes to production

Database migrations have made working with our code base so dramatically easy, it's really amazing. You can spin up a new environment with one command in seconds. Incorporate other devs changes without a second thought.

Migration tools pay most of their attention to schema changes, but data gets largely ignored. So much so that we had to rough in our own approach to dealing with data, which works, but also sucks. And there doesn't seem to be any clear recommendations for data. There are some categories of data I'm concerned with:

List Data: these are values that are usually displayed in drop downs and are either not user generated, or there is a default set we want to seed (ex: states, name suffixes, record types, etc)
Configuration Data: this is data that is required for the system to work, (ex: server uris, an email address to send from, labels to display in the app, etc)

There are only two approaches to dealing with seed that I'm aware of:

Run the schema migrations, then insert all the seed data
Insert the seed data inside the migrations

At first blush #1 seemed easier, so it's the approach we originally took, but it has some drawbacks. The first challenge is to avoid duplicates. You can do that with IF NOT EXISTS(...) statements, or by inserting Primary Keys and ignoring PK violation errors. The second is dealing with schema changes to tables with seed data.

For example, suppose you write a migration to create a table, and you write insert statements to add seed data to it. You have already run your migrations and seeded the database. Now you write a new migration which adds a new not null column to the table. This migration will fail because the table already has data from the seeds and the new column requires a value. In this situation, you're hosed. You have to write the migration to add data into that column before making the column not allow nulls (or use a default value). Effectively you have to duplicate the seed data for that column in both the migration and the seed data file. You need it in both places in order to support creating the database from scratch and migrating an existing database.

#2 seems to have it's own downsides too. There is no one place that shows you all the data that will be seeded in a given table, it's spread throughout the migrations. It precludes you from having a schema.rb type of system (unless that system could somehow be smart enough to include data). That point is somewhat academic at this point though, because FluentMigrator doesn't have anything like this.

However, #2 is much more flexible and can accommodate anything you could ever dream of doing with data (that SQL would support of course). And I feel that pure #2 would be better than a #1/#2 hybrid, because it's just more straight forward.

And now I'd like to ask you, do you know of a good way of dealing with seed data? Are there any existing tools or strategies out there that you could link me to?

Monday, February 27, 2012

Better mocking for tests

The common approach I've seen to TDD and SOLID design in static languages requires a lot of interfaces and constructors that accept those interfaces: dependency injection.

If you only develop in a static language, you probably don't see anything wrong with this. Spend some time in a dynamic language though, and these overly indirect patterns really start to stand out and look like a lot of extra work.

Here's the rub. When your code legitimately needs to be loosely coupled, then IoC, DI, and interfaces rock. But when it doesn't need to be loosely coupled for any reason OTHER THAN TESTING, why riddle the code with indirection?

Indirection costs. It takes time, adds significantly more files to your code base, adds IoC setup requirements, forces more complicated design patterns (like Factories), and most importantly makes code harder to understand.

Testing is important. So if we have to make our code more abstract than strictly necessary to test, it's worth it. But there are techniques other than DI that we can use to make our code unit testable while still reducing the amount of abstraction!

Limited Integration Testing
The first way to unit test without interfaces is to not mock out the object-under-test's dependencies. To not unit test. I'm not talking about doing a full stack test though. You can still mock out the database or other third party services, you just might not mock out your object's direct dependencies.

A -> B -| C
  -> D -| E

Here A depends on B and D which depend on C and E. We've mocked out C and E but not B and D.

The benefit of this approach is that there is absolutely no indirection or abstraction added to your code. The question is what should you test? I'm still thinking of this as a unit test, so I really only want to test class A. It's still going to be calling out to B and D though. So what I want to do is write the test so that it's focused on verifying the behavior of A. I'll have separate unit tests for B and D.

I'll reuse the example from Demonstrating the Costs of DI again. If I wanted to test PresentationApi I could mock out the database layer but still let it call into OrdersSpeakers. In my test I need to verify that PresentationApi did in fact cause the speakers to be ordered, but I don't care if they were ordered correctly. I can do this by verifying that the speaker I added has it's Order property set. I don't care what it is set to, as long as it's set I know PresentationApi called OrdersSpeakers. The OrdersSpeakers tests will verify that the ordering is actually done correctly.

The downside of this technique is that your test must have some knowledge about the object-under-test's dependencies and frequently those object's dependencies. You might expect these tests to be brittle, but surprisingly I haven't had any problems with that. It's more just that to conceptually understand and write the test, you have to think about the object's dependencies.

Static Funcs
I learned this technique from a blog post Ayende wrote some time ago. And he recently showed another example. In this technique you expose a lambda method which is set to a default implementation. The system code calls the lambda which wraps the actual implementation. But the tests can change the implementation of the lambda as needed.

I'll use an example I think is probably common enough that you'll have had code like this somewhere: CurrentUser.

public static class CurrentUser
{
  public static Func GetCurrentUser = GetCurrentUser_Default;

  public static User User { get { return GetCurrentUser(); } }

  private static User GetCurrentUser_Default()
  {
    // ... does something to find the user in the current context ...
  }

  public static ResetGetCurrentUserFunc() { GetCurrentUser = GetCurrentUser_Default; }
}

[TestFixture]
public class DemoChangingCurrentUser
{
  [TearDown]
  public void TearDown() { CurrentUser.ResetGetCurrentUserFunc(); }

  [Test]
  public void TestChangingCurrentUser()
  {
    var fakeCurrentUser = new User();
    CurrentUser.GetCurrentUser = () => fakeCurrentUser;
    
    Assert.AreEqual(fakeCurrentUser, CurrentUser.User);
  }
}

The previous approach could not have handled a case like this, because here the test actually needs to change the behavior of it's dependencies. The benefit of this approach is that we have poked the smallest possible hole in the code base to change only what the test needs to change. And also, the added complexity is constrained within the class it applies to.

But the biggest downside is exemplified by the TearDown method in the test. Since the func is static, it has to be reset to the default implementation after the test runs, otherwise other tests could get messed up. It's nice to build this resetting INTO your test framework. For example, a test base class could reset the static funcs for you.

Built In Mocking
This is by far my favorite technique, which is why I saved it for last ;) I first saw this in practice with the Ruby on Rails ActionMailer. The idea is that you don't actually mock, stub, or substitute anything. Instead the framework you are calling into is put in test mode. Now the framework will just record what emails were sent but not actually send them. Then the test just queries on the sent mails to verify what mail was attempted to be sent.

In the case of ActionMailer, every email you try to sends gets stored in an array, ActionMailer::Base.deliveries. The tests can just look in this array and see if the expected email is there, as well as verify that the email is in the correct format.

With the right mindset this can be applied in all kinds of places. The most powerful place I've found is to the database layer. The database access layer we wrote at Pointe Blank includes this style of mocking/testing. So for example, a typical unit test that mocks out the database in our codebase might look something like this:

[Test]
public void TestAddSpeaker()
{
  ...
  presentationApi.AddSpeaker(speaker);

  TestLog.VerifyInserted(speaker);
}

The upsides to this technique are really awesome. First off, there are no interfaces or dependency injection between the code and the persistence layer. Yet the test can still "mock out" the actual persistence.

Secondly, there is no ugly mocking, stubbing, or faking code in your tests. Instead of testing that certain method calls occurred, you are simply verifying results! By looking in the TestLog, you're verifying that the actions you expected really did occur.

Thirdly, the TestLog becomes a place where a sophisticated query API can be introduced allowing the test to express exactly what it wants to verify as succinctly as possible.

Fourthly, this code is completely orthogonal to any changes that may happen in how a speaker is saved, or how the actual persistence framework works. The code is only dependent on the TestLog. This means the test will not break if irrelevant changes occur in the code. It is actually written at the correct level of abstraction, something that is frequently very difficult to achieve in unit tests.

The only downside isn't even a downside to this technique. The downside is that since built in mocking is so great, you may have a tendency to use the limited integration testing approach more than you should. You may find yourself integration testing everything down to the database layer because the database layer is so easy to mock out. Sometimes that's OK, but sometimes you really shouldn't test through a class dependency.

So those are three techniques I've discovered so far for writing unit tests without riddling my code with interfaces, dependency injection, and IoC. Applying these allows me to keep my code as simple as possible. I only need to apply indirection where it is actually warranted. There is a trade off, as I pointed out in the downsides of these techniques, namely in that your tests may be a bit more error prone. But I feel those downsides are more than acceptable given how much simpler the overall code base becomes!

Monday, February 20, 2012

Meaning Over Implementation

Conveying the meaning of code is more important than conveying the implementation of code.

This is a lesson I've learned that I thought was worth sharing. It may sound pretty straight forward, but I've noticed it can be a hurdle when you're coming from a spaghetti code background. Spaghetti code requires you to hunt down code to understand how the system works because the code is not 'meaningful.'

There are really two major components to this. The first is, if you're used to spaghetti code, you've been trained to mistrust the code you're reading. That makes it hard to come to terms with hiding implementation details behind well named methods, because you don't trust those methods to do what you need unless you know every detail of HOW they do it.

The second is in naming. Generally I try to name methods so that they describe both what the method does as well as how it does it. But this isn't always possible. And when it's not possible, I've learned to always favor naming what the method is for as opposed to how the method is implemented.

There are a number of reason why this works out. The first is that issue of trust. If your code base sucks, you're forced to approach every class and every method with a certain amount of mistrust. You expect it to do one thing, but you've been bitten enough times to know it's likely it does some other thing. If this is the kind of code base you find yourself in, you're screwed. Every programming technique we have at our disposal relies on abstraction. And abstraction is all about hiding complexity. But if you don't trust the abstractions, you can't build on them.

The other reason this works is the same reason SRP (the Single Responsibility Principle) works. You can selectively pull back the curtain on each abstraction as needed. This allows you to understand things from a high level, but then drill down one layer at a time into the details. I like to think of this as breadth-first programming, as opposed to depth-first programming.

So when you're struggling with naming things, remember your first priority should be to name things so that others will understand WHAT it does. They can read the implementation if they need to understand HOW it does it.

Friday, February 17, 2012

Abstraction: blessing or curse

Well this has been a fun week! It's the first time I've ever done a series of posts like this, and all in one week. But now it's time to wrap up and then rest my fingers for awhile.

All of this stuff has been about dealing with abstraction. Seriously, think about what you spend most of your time doing when you're programming. For me, I think it breaks down something like this:

50%: Figuring out why a framework/library/control/object isn't doing what I wanted it to do
30%: Deciding where to put things and what to name them
20%: Actually writing business logic/algorithms

So, yeah, those numbers are completely made up, but they feel right... And look at how much time is spent just deciding where things should go! Seriously, that's what "software engineering" really is. What should I name this? Where should I put this logic? What concepts should I express? How should things communicate with each other? Is it OK for this thing to know about some other thing?

This is abstraction. You could just write everything in one long file, top to bottom, probably in C. We've all been there, to some degree, and it's a nightmare. So we slice things up and try to hide details. We try to massage the code into some form that makes it easier to understand, more intuitive. Abstraction is our only way to deal with complexity.

But abstraction itself is a form of complexity! We are literally fighting complexity with complexity.

I think this is where some of the art of programming comes into play. One of the elements of beautiful code is how it's able to leverage complexity but appear simple.

All of our design patterns, principles, and heuristics (from Clean Code) can largely be viewed as suggestions for "where to put things and what to name them" that have worked for people in the past. It's more like an aural tradition passed from generation to generation than it is like the strict rules of Civil Engineering. This doesn't make any of these patterns less important, but it does help explain why good design is such a difficult job, even with all these patterns.

In this series of posts I've blabbed on about a number of different design topics, each time almost coming to some kind of conclusion, but never very definitively. That's because this has been sort of a public thought experiment on my part. I'm just broadcasting some of the topics that I have been curious about recently. And I think where I would like to leave this is with one final look at the complexity of abstraction.

Abstraction, by definition, is about hiding details. And anyone who has ever done any nontrivial work with SQL Server (or any database) will understand how this hiding of details can be a double edged sword. If you had to understand everything in it's full detail, you could never make it work. But sometimes those hidden details stick out in weird ways, causing things to not work as you were expecting. The abstraction has failed you. But this is inevitable when hiding details, at some point, one of those details with surprise you.

Indirection and abstraction aren't exactly the same thing, but they feel very related somehow, and I think Zed Shaw would forgive me for thinking that. Indirection, just like abstraction, is a form of complexity. And just like abstraction, it can be incredibly useful complexity. I assert that every time you wrap an implementation with an interface, whether it's a Header Interface or an Inverted Interface, you have introduced indirection.

In Bob Martin's Dependency Inversion Principle article, he lays out 3 criteria of bad design:

Rigidity: Code is hard to change because every change affects too many parts of the system
Fragility: When you make a change, unexpected parts of the system break
Immobility: Code is difficult to reuse because it cannot be disentangled from the current application

I think there is clearly a 4th criteria which is: Complexity: Code is overly complicated making it difficult to understand and work with. Every move in the direction of the first three criteria, is a move away from the fourth. Our challenge then, indeed our art, is to find the sweet spot between these competing extremes.

And so I ask you, is it a beautiful code type of thing:

To wrap every concept behind abstract objects (objects or data structures)?
To represent every behavior as a method of some stateful thing (service layers or oop)?
To hide every class behind an indirection in the form of an interface (header interfaces or inverted interfaces)?
To make every class reusable and plugable (DIP: loose or leaky)?

Thursday, February 16, 2012

DIP: Loose or Leaky?

In the last post I looked a bit at the Dependency Inversion Principle which says to define an interface representing the contract of a dependency that you need someone to fulfill for you. It's a really great technique for encouraging SRP and loose coupling.

The whole idea behind the DIP is that your class depends only on the interface, and doesn't know anything about the concrete class implementing that interface. It doesn't know what class it's using, nor does it know where that class came from. And thus, we gain what we were after: the high level class can be reused in nearly any context by providing it with different dependencies. And on top of that, we gain excellent separation of concerns, making our code base more flexible, more maintainable, and I'd argue more understandable. Clearly, the DIP is awesome!

But! I'm sure you were waiting for the But... Since we now have to provide our object with it's dependencies, we have to know:

Every interface it depends on
What instances we should provide
How to create those instances

This feels so leaky to me! I used to have this beautiful, self contained, well abstracted object. I refactor it to be all DIP-y, and now it's leaking all these details of it's internal implementation in the form of it's collaborators. If I had to actually construct this object myself in order to use it, this would absolutely be a deal breaker!

Fortunately, someone invented Inversion of Control Containers, so we don't have to create these objects ourselves. Or, maybe unfortunately, 'cause now we don't have to create these objects ourselves, which sweeps any unsavory design issues away where you wont see them...

What design issues? Well, the leaking of implementation details obviously! Are there others? Probably. Maybe class design issues with having too many dependencies? Or having dependencies that are too small and should be rolled into more abstract concepts? But neither of these is a result of injecting Inverted Interfaces, only the implementation leaking.

I do believe this is leaky, but I'm not really sure if it's a problem exactly. At the end of the day, we're really just designing a plugin system. We want code to be reusable, so we want to be able to dynamically specify what dependencies it uses. We're not forced to pass the dependencies in, we could use a service locator type approach. But this has downsides of it's own.

In the next post, I'll wrap this up by zooming back out and trying to tie this all together.

Wednesday, February 15, 2012

Header Interfaces or Inverted Interfaces?

Thanks to Derick Bailey for introducing me to this concept: Header Interfaces. A Header Interface is an interface of the same name as the class which exposes all of the class's methods, generally not intended to be implemented by any other classes, and usually introduced just for the purposes of testing. Derick compared these to header files in C++. In my earlier Interfaces, DI, and IoC are the Devil post, these are really the kinds of interfaces I was rallying against. (I'll use the ATM example from Bob Martin's ISP article throughout this post)

public class AtmUI : IAtmUI {...}

One of the things I really found interesting about this was the idea that there are different kinds of interfaces. So if Header Interfaces are one kind, what other kinds are there?

The first things that came to my mind were the Interface Segregation Principle and the Dependency Inversion Principle. ISP obviously deals directly with interfaces. It basically says that interfaces should remain small. "Small" is always a tricky word, in this case it's about making sure that the clients consuming the interfaces actually use all the methods of the interface. The idea being that if the client does not use all the methods, then you're forcing the client to depend on methods it actually has no use for. Apparently this is supposed to make things more brittle.

I said "apparently" because I've never directly felt this pain. I guess it really only comes into play if you have separate compilable components and you are trying to reduce how many things have to be recompiled when you make a change. I use .NET, so Visual Studio, and it's not a picnic for controlling compilable components... I think I have this requirement though, and just haven't figured out how to deal with it in .NET. But for the moment, lets assume we agree that ISP is a good thing.

public class AtmUI : IDepositUI, IWithdrawalUI, ITransferUI {...}

The DIP leads to a similar place as ISP. It tells us that higher level components should not depend on lower level components. There is some room for uncertainty here around which components are higher than others. For example, is the AtmUI a higher level than the Transaction? I'll go with no, because the Transaction is the actual driver of the application, the UI is just one of it's collaborators. Because of this, the DIP leads us to create separate interfaces to be consumed by each Transaction:

public class AtmUI : IDepositUI, IWithdrawalUI, ITransferUI {...}

So, maybe there are at least two types of interfaces: Header Interfaces, and what I'll coin Inverted Interfaces. In the last post I talked about the "Service Layer" pattern. It generally leads to the creation of what feel more like Header Interfaces. But this is tricky, because the only difference I can really find here is based on who owns the interface. An Inverted Interface is owned by the class that consumes the interface, and a Header Interface is owned by the class that implements the interface.

But sometimes the difference isn't really that clear cut. If you're TDDing your way through an application top-down in the GOOS style, the Service Layers are designed and created based on the needs of the "higher" level components. So the component, and it's interface, both spring into existence at the same time. So if the service only has one consumer right now, the interface feels very Header-y. On the other hand, it was created to fulfill the need of a higher level component; very Inverted-y.

But if someone else comes around and consumes the same service later: well now we have some thinking to do. If we reuse the interface, then I guess we've made it a Header Interface. Would Uncle Bob have said to create a new interface but make the existing service implement it? The lines are blurred because the components we're dealing with all reside within the same "package" and at least right now don't have any clear call to be reused outside this package.

Sadly, the introduction of these interfaces brings us back to Dependency Injection. So in the next post, I'll look at the Dependency Inversion Principle, and the consequences of these Inverted Interfaces.

Tuesday, February 14, 2012

Service Layers or OOP?

In the last post, I mentioned that my co-workers and I had settled on a design that was working very well for us, but that wasn't very "object-oriented," at least not in the way Bob Martin described it.

We evolved to our approach through a lot of TDD and refactoring and plain old trial and error, but the final touches came when we watched Gary Bernhardt's Destroy All Software screencasts and saw that he was using pretty much the same techniques, but with some nice naming patterns.

I don't know if there is a widely accepted name for this pattern, so I'm just going to call it the Service Layer Pattern. It's biggest strengths are it's simplicity and clarity. In a nutshell I'd describe it by saying that for every operation of your application, you create a "service" class. You provide the necessary context to these services, in our case as Active Record objects (NOTE: service does NOT mean remote service (ie, REST), it just means a class that performs some function for us).

So far so basic, the real goodness comes when you add the layering. I find there are a couple ways to look at this. The more prescriptive is similar to DDD's (Domain Driven Design) "Layered Architecture" which recommends 4 layers: User Interface, Application, Domain, and Infrastructure. From DDD:

The value of layers is that each specializes in a particular aspect of a computer program. This specialization allows more cohesive designs of each aspect, and it makes these designs much easier to interpret. Of course, it is vital to choose layers that isolate the most important cohesive design aspects.

In my Demonstrating the Costs of DI code examples the classes and layers looked like this:

SpeakerController (app)
 > PresentationApi (app)
   > OrdersSpeakers (domain)
   > PresentationSpeakers (domain)
     > Active Record (infrastructure)
   > Speaker (domain)
     > Active Record (infrastructure)

This concept of layering is very useful, but it's important not to think that a given operation will only have one service in each layer. Another perspective on this that is less prescriptive but also more vague is the Single Responsibility Principle. The layers emerge because you repeatedly refactor similar concepts into separate objects for each operation your code performs. It's still useful to label these layers, because it adds some consistency to the code.

Each of these services is an object, but that doesn't make this an object-oriented design. Quite the opposite, this is just well organized SRP procedural code. Is this Service Layer approach inferior to the OOP design hinted at by Uncle Bob? Or are these actually compatible approaches?

The OOP approach wants to leverage polymorphism to act on different types in the same way. Does that mean that if I have a service, like OrdersParties, that I should move it onto the Party object? What about the PartyApi class, should I find some way of replacing that with an object on which I could introduce new types?

There is a subtle but important distinction here. Some algorithms are specific to a given type: User.Inactivate(). What it means to inactivate a user is specific to User. Contrast that with User.HashPassword(). Hashing a password really has nothing to do with a user, except that a user needs a hashed password. That is, the algorithm for hashing a password is not specific to the User type. It could apply to any type, indeed to any string! Defining it on User couples it to User, preventing it from being used on any string in some other context.

Further, some algorithms are bigger than a single type. Ordering the speakers on a presentation doesn't just affect one speaker, it affects them all. Think how awkward it would be for this algorithm to reside on the Speaker object. Arguably, these methods could be placed on Presentation, but then presentation would have a lot of code that's not directly related to a presentation, but instead to how speakers are listed. So it doesn't make sense on Speaker, or on Presentation.

Some algorithms are best represented as services, standing on their own, distinctly representing their concepts. But these services could easily operate on Objects, as opposed to Data Structures. Allowing them to apply to multiple types without needing to know anything about those specific types. So I think the Service Layers approach is compatible with the OOP approach.

In the next post I'll take a look at how interfaces fit into this picture.

Monday, February 13, 2012

Objects or Data Structures?

Here's a great (and old) article from Bob Martin called Active Record vs Objects. You should read it. I think it might be one of the best treatments of the theoretical underpinnings of Object Oriented design I've read, especially because it pays a lot of heed to what OOP is good at, and what it's not good at.

Here's some of my highlights:

Objects hide data and export behavior (very tell-don't-ask)
Data structures expose data and have no behavior
Algorithms that use objects are immune to the addition of new types
Algorithms that use data structures are immune to the addition of new functions
Apps should be structured around objects that expose behaviors and hide the database

This all feels right and stuff, but it's all pretty theoretical and doesn't help me decide if my code is designed as well as it could be. And that's what I'm going to be writing about. In one post a day for the rest of the week I'll look at various elements of "good design," and try to fit the pieces together in some way I can apply to my code.

Good designers uses this opposition to construct systems that are appropriately immune to the various forces that impinge upon them.

He's talking about the opposition between objects and data structures in terms of what they're good for. So apparently a good designer is a psychic who can see the future.

But that is the hard part, how do you know where you'll need to add "types" vs. where you'll need to add "functions"? Sometimes it's really obvious. But what I'm starting to think about is, maybe I need to get more clever about the way I think about types. Because if Uncle Bob thinks apps should be structured around objects, that means he thinks there are lots of examples where you're going to need to add a new type. Whereas, when I think about my code, I'm not really finding very many examples where I could leverage polymorphism to any useful effect.

This could simply be because the problems I'm solving for my application are simply better suited for data structures and functions. Or it could be because I'm just not approaching it from a clever enough OO angle.

Recently, my co-workers and I had pretty well settled on a design approach for our application, and it has been working extremely well for us. However, this article and it's clear preference for objects and polymorphism has me wondering if there may be another perspective that could be useful. I'll talk more about this in the next post.

Sunday, February 5, 2012

Demonstrating the Costs of DI

Here's a followup to my previous post "Interfaces, DI, and IoC are the Devil"

This time I want to demonstrate a little of what I'm talking about with some code samples. I have contrived an example. I wanted something that is real, so I took the code structure and the responsibilities from some real code in one of our applications. But I changed the names to make it more easily understood. I also removed some of the concerns, such as transaction handling and record locking, just to make it shorter. I'm trying to be fair, so the example is not trivial, but it is also not complicated.

Note that I didn't compile any of this code, so please excuse typos or obvious syntactic errors I might have over looked.

Pretend we are writing an application to manage a conference. It's an MVC 3 C# web app. We have presentations, and presentations have speakers. The speakers are ordered (1,2,3, etc). As an optimization, we will cache some of the information about the speakers for a given presentation: how many are there and who is #1 ordered speaker. This information will be cached on the PresentationSpeakers table.

The structure of the code is as follows. The SpeakerController's Create action method is called to add a new speaker to a presentation. This controller delegates the job to the PresentationApi class. This class deals with coordinating the various domain services, and in real life would have dealt with database transactions and record locking/concurrency. PresentationApi delegates to OrdersSpeakers (which obviously assigns the order numbers to the speakers) and PresentationSpeakers (which caches information about the speakers on a presentation). Finally the Speaker and Presentation classes are active record objects.

This first example demonstrates the simplest way in which I would like to write this code. There are no interfaces, there is no constructor injection, and it is using the Active Record pattern for database persistence.

The next example addresses the "problem" of the controller having a direct dependency on the PresentationApi class by adding IPresentationApi and injecting it through the controller's constructor. Notice that I also convert PresentationApi to be a singleton at this point and remove it's instance variables. This isn't strictly required, but it is typical. Notice how I now have to pass the presentationId into the LoadPresentationSpeakersRecord helper method.

In the third example, I remove PresentationApi's direct dependency on OrdersSpeakers.

Finally I eliminate ActiveRecord and replace it with the most evil pattern in the world, the Repository pattern. I chose to implement this in the way I've most commonly seen, with one interface per database table (give or take).

So, if you had your choice, which of these versions would you prefer to have in your code base? Which would you prefer to have to debug, understand, and maintain?

My answer, not surprisingly, is #1. Notice how each version adds more and more cruft into the code and obscurs what the code is actually trying to accomplish. This is the cost of indirection. It's really the same reason people like to fly the YAGNI flag. And perhaps you've heard the old adage, "Do the simplest thing that could possibly work". I desperately want as few constructs between me and what my code does as possible. Which is why I yearn for the simple code in example #1.

PS. If you actually read those code samples and studied the differences between them, pat yourself on the back! I know this is very difficult to follow. I actually wasn't intending to blog it, I just wanted to go through the motions myself and see how it turned out. There have been quite a few times where I wrote something off because of my image of how it would turn out. But then when I actually did it, it turned out much differently (MVC, MVP, and MVVM were all like that for me). But in this case, it turned out just how I'd imagined it... Crufty.

Wednesday, February 1, 2012

Interfaces, DI, and IoC are the Devil

I want to write unit tests. To do that, I need to be able to swap out the "dependencies" of the object under test with mocks, or stubs, or fakes. In static languages like C# or Java this is accomplished by two means:

Using interfaces instead of the concrete class
Injecting instances of those interfaces through the constructor (Dependency Injection)

There are other ways, but this seems to be the widely regarded "best" way. And I don't like it.

The core of my dislike stems from one simple thing: interfaces don't describe constructors. So once I'm forced to depend on interfaces, I can no longer use the programming language to create objects. I can't use the new keyword. The simplest most fundamental feature of all objects, creating them, is barred from me.

To be clear, this isn't Dependency Injection's fault, nor is it IoC's fault. It's the interface's fault. Here's my favorite feature of Ruby:

MyClass.Stub(:new) {...}

That simple line of code stubs out the new 'message' to the MyClass type, allowing me to return a stub instead of an actual instance of MyClass. This demonstrates why Ruby is so much easier to test than today's static languages:

You can intercept any message to the object w/o having to depend on an interface
You can intercept the constructor the same way you'd intercept any other message

But back to topic, why is it a problem that I can't use the new method? In my experience this causes some nasty coding practices to emerge. I'll take them in turn:

Everything is a singleton
As soon as I couldn't create instances, I wrote all of my objects in a stateless style (no instance variables) and registered them as singletons in the IoC container. This leads to all sorts of other code smells within that object. Since you can't promote variables to instance variables you end up passing all your state into methods, leading to methods with lots of parameters, which is a Clean Code smell. Methods passing parameters to other methods to other methods to other methods...

Custom "init" methods are introduced
To combat the last problem, you might add an initialize method to take the place of the constructor. This will work, though it's non-standard and confusing. You also have to decide how to register the class in the IoC container. I've seen this done with a singleton, where the Init method clears the old state and setups the new state WHICH IS EVIL. Or you can have it create a new instance each time, but you'll never know how it's configured when you're using it, more on that later.

Classes that could benefit from refactoring to have instance variables, don't get refactored
Both of the above contribute to this problem. When you add that next "context" parameter to your stateless class, it may be the straw to break the camels back, causing you to refactor those parameters to instance variables. But time and time again I've seen the uncertainty around the lifecycle of the class in IoC lead people to delay this refactoring. Again, more on the lifecycles later.

Factory interfaces are created to wrap the constructors
Finally, when I'm fed up with the above, I introduce a factory class who's sole purpose is to wrap the constructor of the class I actually want. This is just bloat, plain and simple. And it's yet another class that gets registered with the IoC container.

I also have some complaints about DI and IoC when they are leveraged simply to enable unit testing. I'd like to call into question some of the typical assumptions around DI/IoC, and we'll see where it takes me.

Interfaces are good because they make your code loosely coupled
This is the most common assumption I see that I disagree with. I've seen this in Fowler's writing on DI and in the GOOS book, which makes me seriously doubt myself here. But no amount of time or practice has changed my opinion. The argument is that it's good to use interfaces over ALL OF YOUR CLASSES because it allows you to make them ALL PLUGABLE.

I find this completely ridiculous! There are certainly classes that it is useful to have be "plugable" but they are few and far between. The majority of the classes that I create are Single Responsibility, extracted from other classes, and small. They serve a very pointed, very well understood, and very well defined purpose in the larger web of objects. The chances I would want to PLUGIN in a different version of this class are virtually zero (aside from for testing). I might change it as the design of my domain changes over time, and that will be easy because it's small and focused. But I'm not going to suddenly wish it was plugable! I might even argue that wrapping these things in interfaces hurts cohesion.

I should clarify, I do want to swap these classes out so I can test, and that's why I'm forced to use interfaces. But the argument that using interfaces for their own sake, even if I wasn't testing, for all classes is a good thing is one I just find outrageous. It's an unnecessary level of indirection, adding more complexity, and giving me nothing useful in return: YAGNI.

IoC is good because it makes the lifecycle of my objects someone else's problem
This is another argument I've frequently seen in favor of IoC. That moving the responsibility for the "lifecycle" of your objects all the way up to the top of your application is a good thing. This is another argument I find mind boggling. Again, there are a minority of cases where this is useful. An NHibernate session that needs to be created for each web request comes to mind.

But again, 90% of the time I've got an instance of an object that wants to delegate some task pertaining to its current state to some other object. I want that object to come into existence, perform some task for me, and go away. Basically, if I wasn't encombered by interfaces and DI and IoC, I'd new up the object passing in the necessary state and tell it to do what I need. The garbage collector would take care of the rest. But when I'm burdened by IoC, suddenly there is a lot of uncertainty around what the lifecycle of each object should be. And you can't tell what lifecycle a given injected object has. This is a significant issue that affects how you will use that object, and how that object is designed.

Dependency Injection is good because it makes my object's dependencies obvious
I've encountered this argument in many places too, notably this blog post. The argument roughly goes: "To unit test an object you need to know what it's dependencies are so you can mock them. But if those dependencies aren't clearly specified in the constructor, how will you know what they are?! So DI is awesome because it clearly specifies the dependencies!"

This is such nonsense it's really quite amusing. Knowing what the dependencies are doesn't even BEGIN to help me mock them out. I have to know what methods are called, with what parameters, expecting what return value, and how many times! In other words, I have to know everything about how the object uses that dependency to successfully and cleanly mock it out. A list of what dependencies are used barely scratches the surface and is definitely NOT a compelling reason to use Dependency Injection!

Another downside of IoC/DI that I've often run into is that it spreads like a virus through your code. Once you start using DI, every class ABOVE that class in the call stack also has to use DI. Unless you're OK with using the container as a service locator, which few people are. So essentially, if you're going to use DI you're going to have it in the majority of your classes.

I'm painfully aware that this post is just a lot of complaining and contains no useful suggestions toward correcting any of these issues. What I want is what Ruby has, but C# just doesn't support that. I may experiment with some forms of service locator, leveraging extension methods. Or I might try to create some sort of internally mockable reusable factory concept. I'll write it up if it leads anywhere, but I doubt it will.

So I guess the only point of this post is to vent some of my frustrations with DI, IoC, and interfaces and get feedback from all of you. Perhaps you have felt some of this pain too and can validate that I'm not totally crazy? Or maybe, and I sincerely hope this is true, I'm missing some fundamental concept or perspective on this and you can fill me in? Or might it even be possible that this could be a first step toward cleaning up some of these pain points, even if just in a small way?

Or you might read all of this and shrug it off. You're used to it. It's always been this way, it can't be any other way, go about your business. If that's your thinking as you're reading this sentence, I'd just encourage you to re-examine your assumptions about what's possible. Take a look at Bernhardt's testing in DAS. THEN come back and give me a piece of your mind.

Thursday, January 26, 2012

Weird .NET Regex

I was working on a test for SimpleXml and encountered a really weird regex behavior.

I was trying to have a multiline regex match some xml to verify that it had been updated correctly. I chose regex just because I thought it would be simpler than using an xml parsing library (other than SimpleXml, and I wasn't sure I liked the idea of using SimpleXml in the SimpleXml tests...).

For example, I was trying to match xml like this:

<root>
  <node>test1</node>
  <node>test2</node>
</root>

with a regex like this:

Regex.IsMatch(xmlString, "<node>test1</node>.*<node>test2</node>", RegexOptions.MultiLine);

It should match, but it's not matching. I tried all kinds of variations throwing in end of line and start of line matchers, etc and nothing worked until I found this:

Regex.IsMatch(xmlString, "<node>test1</node>.*\s*<node>test2</node>", RegexOptions.MultiLine);

For giggles I tried it with .*.* but that doesn't work. The only pattern I found that worked was .*\s* and I really don't understand why. So if you can explain why, I'd love to hear it!

update:
Thanks commenters!

Turns out there were 3 things I thought I understood about regex that I didn't:
#1: As explained on regexlib.com \s matches any white-space character including \n and \r. So that's actually all I needed. No .* required, and no Multiline option required.
#2: Multiline doesn't change the behavior of .* to make it match newlines like I thought. It only affects $ and ^, as explained in msdn here.
#3: Singleline is the option that changes the behavior of .* to make it match \n.

So, the final regex I needed was simply:

Regex.IsMatch(xmlString, "<node>test1</node>\s*<node>test2</node>");

Monday, January 23, 2012

SimpleXml

I released my very first Open Source project this weekend! It's called SimpleXml. It's a tiny, single file, 180 line dynamic xml parsing library. Really it's just a simple wrapper around XElement.

The source, issues, and docs are hosted on bitbucket: https://bitbucket.org/kberridge/simplexml/
And it's published to nuget: https://nuget.org/packages/SimpleXml

SimpleXml was inspired by PowerShell's xml support. There have been a number of times I've wanted to do some small simple xml reading/writing job in C# and really wished I had the simplicity of powershell's xml api. Now I do!

You can checkout the bitbucket page for more examples, but here's a simple one:

dynamic x = "<root><child>myvalue</child></root>".AsSimpleXml();
Assert.AreEqual("myvalue", x.root.child);

It doesn't get much easier than that!

Hopefully this will prove useful for someone, but my main motivation for creating it was just to have the experience of creating and open sourcing something simple from scratch. It would be awesome to have the full experience of people forking the repo, and submitting pull requests too!

Saturday, January 14, 2012

CodeMash 2.0.1.2

Josh Schramm did a CodeMash recap. And in the spirit of maximizing the value of your keystrokes (as presented by Scott Hanselman at CodeMash and blogged about by Jeff Atwood), I thought I'd do the same.

This year was my third CodeMash. Every year I enjoy my time at CodeMash more than the last. It was nearly 2x larger this year, but the "feel" of it didn't seem to change at all.

Precompiler
Vital Testing - Jim Weirich
Jim gave a good introductory talk to some of the elements of TDD that are hard. He asked everyone to rate themselves in these categories, then focus on which categories they wanted to work on while TDDing some katas.

I really enjoyed his insight and perspective to TDD even though it was pretty basic. But this session was still one my favorite of the entire conference because Ben Lee spent it teaching me Erlang. We did the Greed Dice Game kata and came up with this (nearly complete) solution in Erlang. Erlang totally blew my mind and renewed my interest in functional languages. I hadn't programmed in this style since College, so it was really awesome to get exposed to it again.

Day 1
Keynote
Keynote was good. Ted Neward basically talked about being Pragmatic in how you approach building big systems. He walked a fine line between saying that Enterprise needs to be simplified without ever saying that Enterprise is a bad word. Also, he swore alot.

Here are some of the recommendations I really liked:

Resist the temptation of the familiar
Reject the "Goal of Reuse"
Befriend the uncomfortable truth

be cynical
question the assumptions
look for hidden costs
investigate the implementations

Eschew the "best practice"
Embrace the "perennial gale of creative destruction" (AKA, you will have to learn new things)
Context matters: create an evaluation function of your own for new tech
Attend to goals

Inside the Microsoft Web Stack of Love - Scott Hanselman
Hanselman is an amazing presenter. His room was overflowing 15minutes before he was even scheduled to talk, but he kept everyone entertained by first typing funny stuff into notepad, and then playing YouTube videos.

He did a bunch of demos of a bunch of stuff and made one umbrella point: that MS wants to unify all the tools under ASP.NET and encourage devs to combine these tools as needed. For example, create an app that uses both MVC, WebForms, Signal R, and Web API. I was most impressed with Web API, which as far as I could gather is just the new WCF REST stuff. WCF is a really bad word in my office, because WCF was really awful. We like to say it takes x time to write a WCF service, and 2x time to configure it. But the new Web API looks alot like MVC, but without all the attributes! So it's even cleaner!

Mastering Change With Mercurial - Kevin Berridge
I was very happy with how my talk went. I probably spent 30+ hours preparing and practicing this talk on one of my favorite subjects: DVCS. It was a combination of drawings, screenshots, and screen capture videos. The most memorable part of it for me was how many questions I got. People were very interested in Mercurial Queues in particular, which is a pretty complicated topic. So I was glad I'd presented it in a way that obviously aroused people's curiosity enough to want to understand it better.

Functional Alchemy - Mark Rendle
Mark showed a bunch of different functional techniques implemented in C#. The most memorable were a Memoize implementation, a .AsAsync extension method, and a clever Try Catch trick to DRY up catch blocks. Just about everything he showed I intend to use at some point in our work projects.

I was able to ask him after his talk if there were any performance concerns with depending on lamba expressions so heavily in C#. His answer was fascinating. He said in .NET 3.5, the cost could be non-trivial, but in .NET 4, you could practically wrap every expression in a lambda if you wanted to.

Effective Data Visualization - David Giard
Visualization is a concept I've been really excited about recently, but haven't started to dive into much yet. This was a fun talk with lots of examples of different visualizations. And it presented many of Edward Tufte's rules: Lie factor (change in data/change in visual representation), Data-ink ratio (data ink/non-data ink).

Day 2
Dealing with Information Overload - Scott Hanselman
I went to this talk just to be entertained, but I think I will actually get something out of it. Scott recommended making a list of all your data "inputs" and ranking them in terms of priority to YOU. Stuff like work email, home email, twitter, facebook, google reader, and even TV.

C# Stunt Coding - Bill Wagner
I learned a couple new things in this talk about .NET's Expression object. And the first example literally applied to the code I had open in my lap at that very moment, which was such a wonderful coincidence! It also made me realize I need to spend some time digging through the framework. They've added so much new stuff in 3.5 and 4.0 and I never took the time to really study the additions, as I figured I'd discover them eventually.

Capability vs Suitability - Gary Bernhardt
I actually was sitting in Applied F# waiting for it to start when Corey Haines tweeted that Bernhardt's talk was so great everyone really needed to attend. And if you know me, then you know Bernhardt is kind of my hero. So it was taking all my will power to stay in the F# talk, even w/ how pumped up I was about functional from my previous Erlang experience. So once Corey Haines piled on, my will power lost out and I switched to Bernhardt.

It was an interesting talk with some cool history. I think the biggest take away for me was his discussion of Activity vs. Productivity. If you type really fast, you are productive at making characters appear on the screen, but that doesn't necessarily mean you are Getting Things Done any faster than the next guy. So, Activity != Getting Things Done. And I suspect I suffer from that.

His point with the Activity thing was that when you see a lot of activity, like in Ruby, that doesn't mean they're really accomplishing a lot of practical work. He went so far as to say Java is probably where the real work is getting done. While the Ruby people are running around, being active, making lots of noise.

His broader point was when you see all that activity, it's probably an indication that there is something new happening. That they are pushing the capability boundaries. And when you look at the history of our industry those expansions of capability are usually followed by contractions to suitability.

It was a pretty thought provoking talk. Not least of all because I think it's an over simplification that I don't fully agree with, but I haven't been able to put my finger on it yet.

Conversation - With Everyone
I go to most of the sessions at CodeMash 'cause I'd feel guilty if I didn't. But what makes CodeMash worth my time is really the conversations with so many people doing so many different things with so many different tools on so many different platforms. It's like what I try to do at Burning River Devs * 1000. And it's what renews my energy for the rest of the year to keep fighting all the technical, process, and people related battles that come with building software.

Sunday, December 18, 2011

Stories of Productivity

The first time I tried pomodoro, it was exhausting. Staying completely focused and working for 20 minutes straight tired me out! I couldn't believe it! I thought I was very focused, all the time. I thought my productivity was good. I couldn't even work for 20 minutes!

--

I used the demo of TimeSnapper for awhile once. It's a neat program. It monitors the applications you use throughout the day. It can even play back a video of what you did all day, greatly sped up of course. You tell it which applications are productive, and which aren't, and it has this neat timeline graph that shows green for productive time, and red for unproductive time. In using it I quickly discovered something that I was not consciously aware of, but was very interesting. As I was working, if I hit a point where I had to either wait for the computer, or I didn't know exactly what to do next, I would switch to an unproductive application.

For example, if I was coding an algorithm, and I hit a particularly difficult part of it, I'd pull up twitter. Or hit up google reader. Or check my email. It was like some kind of weird nervous twitch. Any time I had to ACTUALLY think, I'd go do something that didn't require thought. And I was totally unaware that I was doing it.

--

Recently I was taking a screencast of myself doing some work at the prompt. It was just a proof of concept, so I hadn't planned it out, and I was sitting in front of the TV at home. I knew what commands I wanted to record, but I hadn't really thought through the examples. You could see I was typing fast and quickly moving from command to command. But then I'd hit a part where I had to make up some nonsense content to put in a file, or think up a commit message, and there would be this really long pause. The pause was way longer than it actually took me to come up with the content. What was happening was, as soon as I needed to do some creative thinking, I'd glance up at the TV and get lost for a few seconds. And again, I was totally unaware this was happening until I watched the video.

--

One of the things I've been struck by when I watch Gary Bernhardt's Destroy All Software screencasts, or the Katacast he did, is how fast he is. Now, he practiced this stuff, it's not like you're watching it come off the top of his head. But even still, he's FAST. But I realized, the thing I'm most impressed by is really not how fast he can type. I mean, he can type fast, and very accurately. What's most impressive is how he is always prepared for the next step. He always has the next thing he needs to do queued up in his brain.

Once I noticed this, I started trying to figure out how to get closer to that during day to day development. In a surprising twist, what I've found so far is the best way to go fast is to go slow. That's kind of a cliche, but it's overwhelmingly true. If I give myself the time to think things through, I waste a lot less time in starts and stops and blind alleys. And if I take just 1 second longer to fully visualize all the steps of what I'm about to do, I'm able to execute it faster, smoother, and with a lot less stress.

--

We recently re-did our office arrangement. We tore the walls down and made sure everyone was sitting with their teams. There have been some nice benefits. For one thing, it's way more fun. There are many times when spontaneous design and organization decisions are made just because everyone can hear you. And I think we've built a better sense of team in the process.

Of course there are downsides. It can get noisy and be distracting. Especially when random conversations and jokes break out. I think it's just human nature to have this desire to not be left out of conversation. You can put in head phones, but I find sometimes even music is enough of a distraction that I can't get my thoughts straight. And because I don't want to be left out, I usually keep the volume just low enough so I can track what's going on around me.

So there is a trade off with this open spaces, everyone together layout. You gain some productivity in instantaneous meeting-less decisions. You gain some camaraderie and some fun. But you can't close the door and shut out the world so you can fully focus when you need to. I'm still not sure how I feel on this one. The focus and productive obsessed part of me likes Peopleware's advice of everyone in their own office with a door. But the social part of me likes the team room model.

Thursday, December 1, 2011

Powershell: Extracting strings from strings

I was playing with NCrunch. It didn't work for our solution due to some bug w/ named parameters in attributes. So I removed it. But it left behind all kinds of little .xml files. I could see these files in hg st as "?"'s and I wanted to remove them.

So I used this simple powershell command:

hg st | %{ [regex]::match($_, ". (.*)" } | %{ $_.Groups[1].Value } | %{ rm $_ }

The regex captures the name of the file, skipping the "? " at the beginning of the line. The Groups[1].Value extracts that file name. And rm removes it.

That version is using the .NET regex class directly and piping the object that matches outputs. You can make this shorter, though slightly more confusing in some ways, using powershell's -match operator:

hg st | %{ $_ -match ". (.*)" } | %{ rm $matches[1] }

This is using the magic $matches variable which is set to the results of the last executed -match operator. The reason I say this is slightly more confusing is that it depends on the order of execution of the pipeline. This wouldn't work if the pipeline ran all the -match's and then ran all the $matches. But because the pipeline is executing each %{} block once for each object, it does work.

If hg outputted objects instead of text, this would have been much easier. But this shows how you can lean on regex's when you have to deal with strings.

Friday, September 23, 2011

Powershell and Hg Magic

I moved a bunch of files that were in an hg repo and did an hg addremove -s 100. They all should have been recorded as renames, but hg summary showed me that 1 of them wasn't. But which one?

Powershell to the rescue!

$s = (hg st -a -C) -join "`n"
[regex]::matches($s, '^A.*$\n^A.*', "Multiline")

Lets break this down:

hg st -a -C: lists all added files including what file they were copied from. Hg st considers a rename to be a copy and a remove. For each renamed file this will output two lines:
A <file\path\here>
<copied\from\path\here>
$s = (...) -join "`n": takes the array of strings resulting from the hg st command and joins it into one big string in the $s variable.
[regex]::matches($s, '...', 'Multiline'): Runs a multiline regex on the string
'^A.*$\n^A.*': Regex matches a line that starts with an A, followed by anything to the end of the line, followed by a line break, followed by another line that starts with A, followed by anything. In otherwords, this will match if two lines of output both start with A. In this case, that means the first line is the line that was not recorded as a rename!

Tuesday, August 23, 2011

.NET is Stale?

Here's dhh on twitter: "Wish someone would study the cultural inhibitions in Denmark that binds it to stale, conservative platforms like .NET"

.NET is stale? Fuck you!

Nuget
Fluent Migrator
Manatee

Not to mention the language features of C#:

Is C# the most elegant language ever invented? No, but it is one of the most elegant I have used, especially for a statically typed language. And the language itself is clearly one of the most advanced available. This is stale?

Did all of these ideas originate in .NET? No, but what the hell difference does that make?! The .NET community finds and adopts the best ideas, whether they started in Java, Ruby, or Python. This is stale?

Are there companies still using .NET 2.0 and little to no open source software? Yea, there are also companies on the bleeding edge, using all the tools listed above. From organizations with strict upgrade guidelines, to organizations that wait for the first service pack, to organizations that go to production on beta releases. You'll find it all in the .NET community. This is stale?

Ruby is a joy to program in. Dynamic languages are more fun to do TDD with. Percentage wise, I'm sure more Ruby programmers participate in the open source community. There are a wide array of really great things about Ruby (and Python, etc etc). There are also plenty of shitty things (poor backwards compatibility, poor documentation, poor tutorials, elitist attitude, etc etc).

But this bullshit attitude that .NET is stale, outdated, joyless, or somehow dramatically inferior is nothing but short sighted and stupid. Get over your buyer's remorse and go build some software that contributes to something larger than yourself.

* Did I leave off your favorite fresh .NET tool or feature? Leave it in the comments.