Wednesday, November 14, 2012

Encapsulation: You're doing it wrong

In the last post, I investigated just what the devil encapsulation actually is.  I may not have answered that question, but I did decide that whatever it means, there's a subtle but important distinction to be made around encapsulating "data".  The example that launched that distinction was a Queue which stores data from the caller in some encapsulated implementing data structure.  Notice the distinction between the caller's data, and the Queue's implementation.

One way of approaching a new OO design in a "business" environment is to ask, what data do I have?  Then create a "model", and add a property for each data element.  C#'s { get; set; } properties highly encourage this, and ORM and ActiveRecord tools require it.  So now we have little data classes, structures basically.  But we know that we're not doing OO unless we're doing encapsulation, and that means we need some methods!  So we add some methods to our little data classes that usually either modify that data in some way, or perform some calculations with it.

But what is this class encapsulating?  All the data is fully exposed, and the methods are restricted to simple operations on the same data.  Clearly it's trying to represent something, but we started from some data which more than likely corresponds directly to a database table.  So what is it representing?  At best, one data thing.  And what is it encapsulating?  Some logic about that data.

But looking at this again from the perspective of encapsulation as bundling implementation details instead of data, we could go a different route.  When thinking about a Queue, I don't think about it's internal implementing data structure.  I think about the operations I want it to perform for me.  So instead of asking, "what data do I have?", "what operations do I need to perform?" could be better starting point.

What if all the properties were moved off the object onto their own little class -- or structure -- or in F#, record.  The original object would then be left with operations only.  And one of those operations would have to be getting the data, and that would just be a simple method that returned the little data class/structure/record.  This class is encapsulating the implementation details of those operations you decided you needed to perform!  And just the like the Queue, there is now a clear distinction between the caller's data and the class's implementation.

A number of interesting benefits follow from this:
  • Enables a coarser grained interface, which is especially useful for data access.  You gain the control to define operations to retrieve as little or as much data as you need.
  • Designing around encapsulating implementation details leads to objects that are well defined with intuitive behaviors and clear purpose.  Ultimately that means it's easier to find the behavior you want, and extend behavior when needed.
  • The resulting clean behavioral interface, passing and returning data, immediately results in simple and flexible decoupling, which is great for unit testing.
And these are just the benefits realized at the level of just the one class we modified.  In the next post I want to look at what happens when this architecture is applied through out in what I call a stratified design.

Tuesday, November 13, 2012

Encapsulation: What the devil is it?

I love the word 'Encapsulation.'  It's a big fancy word and I feel smart when I use it.  Unfortunately, I'm not really sure what it means, and neither is Wikipedia.  "Encapsulation is to hide the variables or something inside a class."  I lol'd when I read "or something," what a specific definition!  So, what the devil is it?

The most naive OO definition might be:
A language feature that bundles data and methods together.
You might extend that to say that it hides the data from public consumption, but that part muddies up the water, as the Wikipedia article demonstrates.  My favorite example of Encapsulation is a Queue class.  You get push, pop, and peek operations to call, but you don't know what data structure the uses to implement those operations.  It could be an array, it could be a linked list, whatever.  In this we can easily see the beauty and the power of encapsulation: "data" and "methods" together.

But wait, what did I actually encapsulate in that queue?  Was it the "data"?  I pass my data into and out of the Queue, and the Queue hides its implementing data structure from me.  Maybe it's a Queue<string>, and I'm all: q.Push("encapsulation"); q.Push("is"); q.Push("about"); q.Push("data"); Assert.AreEqual("encapsulation", q.Pop());  For me the data are those strings, but those strings are clearly not what the Queue is encapsulating!

That word "data" in our definition is a tricky one.  It can be applied to too many things to be really useful.  But does replacing "data" with "data structure" in the definition fix the problem?
A language feature that bundles data structures and methods together.
It clears it up, but it introduces another problem.  For example, what if my object is a database gateway?  Certainly there's a data structure somewhere in that database, but my object isn't directly encapsulating that!  No, it's probably "encapsulating" ADO.NET procedural calls, or some other data access library.  The procedural calls are neither data nor data structure...  So could it be that thinking about data is completely misleading?
A language feature that bundles implementation details and methods together.
This is a rather large step though!  Instead of just talking about data, or data structures, this now includes just about anything in the definition of things that can be bundled with methods to achieve encapsulation!  Maybe there is some value in restricting what the word "encapsulation" applies to, but if there is, I doubt it's something that is going to prove useful for Software Engineers.  So while I admit this definition could be a perversion of the word "encapsulation," I find it more useful.

The other definition Wikipedia gives for encapsulation, which I've neglected to tell you about until now, is "A language mechanism for restricting access to some of the object's components."  This is more similar to the definition I just ended on.  I take some issue with the word "restricting" and the word "components" is ambiguous enough to be a problem.  But I don't think it's a stretch to think "components" could include both data and dependencies.

So, perhaps we've arrived at a better understanding of encapsulation.  One that recognizes that data is not the all important concept.  The next step I'd like to take is to extend this slightly deeper understanding outside the realm of data structures and into more typical "business" scenarios.  That will be the next post.