kwblog: April 2007

Wednesday, April 11, 2007

Memory Leaks, Garbage Collection, and .NET

I have frequently read and been told that .NET (and Java) do not have memory leaks. Of course, this depends on the definition of a memory leak.

One definition says a memory leak occurs when a program allocates memory from the operating system but never gives it back. This is pretty clear, so I'm going to stick with it.

Why does .NET not have memory leaks? It has a garbage collector. The garbage collector checks your memory for you every so often to determine if it is still referenced anywhere. If it is, its assumed you're using it. If it isn't, you can't possibly be using it, and the memory is freed and given back to the system.

Therefore, languages with garbage collectors clearly can't have memory leaks, right? I'm going to show you why I disagree with that statement.

Lets look at a simple example in C#. Suppose you've written a class which acts as a DataTable cache. This will be a static class. The purpose of this class is to store DataTables which will be used by various controls throughout the application. If a DataTable gets updated, this class will update all the controls that use that DataTable.

public static class DataTableCache
{
Dictionary dtCache = new Dictionary();
Dictionary controlStore = new Dictionary();

public static void AddDataTable( string key, DataTable table, Control ctl )
{
if ( dtCache.ContainsKey( key ) )
{
dtCache[key] = table;
// update controls using key, if any exist yet
}
else
{
dtCache.Add( key, table );
// update controls using key, if any exist yet
}
AddControl( ctl, key );
}

public static void AddControl( Control ctl, string key )
{
controlStore.Add( ctl, key );
}

public static DataTable LookupDataTable( string key )
{
if ( dtCache.ContainsKey( key ) )
return dtCache[key];
else
return null;
}
}

I'm keeping this example as simple as possible, but it is modeled on real code. There is a "memory leak" problem here. Suppose you create a Form, f. On that form you add a ComboBox, cb. You obtain a DataTable which you use as cb's DataSource and you add them both to the DataTableCache. When f is closed it will be disposed along with all of its controls, including cb. However, because cb is still in the controlStore dictionary it will never be freed by the garbage collector. It has been disposed, but it is still referenced by our static DataTableCache.

No, don't worry, I'm not claiming this qualifies as a memory leak. This is simply programmer error. The programmer should have accounted for this problem. There are two obvious ways to fix this. The first might be to add a RemoveControl method and call that when the form closes. The second, and better approach, is to register the control's Disposed event in DataTableCache and remove it from the dictionary when that event handler is fired. And while we're at it, we'll also make the event handler check to see if any other controls are still using the disposed control's key after it is removed. If none are, we'll remove the DataTable too.

AddControl would now look like:

public static void AddControl( Control ctl, string key )
{
controlStore.Add( ctl, key );
ctl.Disposed += new EventHandler( ctl_Disposed );
}

And the event handler would look like:

private void ctl_Disposed( object sender, EventArgs e )
{
string key = controlStore[(Control)sender];
controlStore.Remove( (Control)sender );
if ( !controlStore.ContainsValue( key ) )
dtCache.Remove( key );
}

Problem solved. No more memory leak.
Actually, wrong. We still have a memory leak. The DataTable will be disposed and freed by the garbage collector, but the Control wont be. If you don't believe me, try it out yourself. You can use a memory profiler to see that the combo box remains in memory no matter how many times you run the garbage collector.

This is the memory leak! No where in our code do we have a reference to the combo box, so why isn't the garbage collector working? Well, it turns out its because we registered the Disposed event. Any time you register an event you get this little intermediate object behind the scenes which holds a reference to the object you registered the event on and the object you registered the event from. So if you have ever registered an event between two objects, where one of them had a longer lifetime than the other, then you have a memory leak. And if one of those objects is static, as in this example, then the memory wont be given back to the system until the application is closed.

How serious is this? It depends on the circumstances, but most of the time it will be pretty serious. In our example its very likely that the Form, f, had registered the ValueChanged event of the ComboBox, cb. That's a very common need. Because of this, both cb and f will never be freed. And imagine how many controls could potentially be on f. Now none of them can be freed either.

How do we solve this? Its very easy, we just modify the Disposed event handler as follows:

private void ctl_Disposed( object sender, EventArgs e )
{
string key = controlStore[(Control)sender];
controlStore.Remove( (Control)sender );
((Control)key).Disposed -= new EventHandler( ctl_Disposed );
if ( !controlStore.ContainsValue( key ) )
dtCache.Remove( key );
}

I consider this a memory leak because 1) its hard to consider this programmer error since its actually the .NET framework which contains the references and 2) this problem is very very hard to remember to avoid. This means you really do need to use a memory profiler on your applications before releasing them. And therefore I claim that C#, at least, does suffer from memory leaks.

I haven't had a chance to test this on any other languages. I'd be very interested to hear if the same is true in Java, Python, and Ruby for example.

Thursday, April 5, 2007

Fair Witness

In the book Stranger In a Strange Land, Robert Heinlein introduces the concept of a Fair Witness. In the book a Fair Witness is someone with complete memory recall (they remember EVERYTHING!) who makes no assumptions about anything. They're used for various legal things because they are infallible observers.

There's a quote from the book which I always liked. I can't remember it exactly but it goes something like:

"Anne, function as a Fair Witness! What color is that house over there?"
"This side is blue."

Obviously the other sides could be any color at all. But it wouldn't occur to you to think that way, and in most cases it wouldn't make sense to.

However, attempting to assume this kind of mind set can be enormously helpful for a software developer. How many times have you been tracking down a bug for hours and hours and hours, unable to figure out what's causing it. "Everything looks right!" you keep saying to yourself. Finally you discover the problem: you had made an assumption about some part of your code that turned out not to be true.

Sometimes its an assumption that you programmed into the code that turns out to be false. These are usually easier to find. Other times its an assumption you made in your mind about the behavior of the code. Could be, "I know this method is working fine" or "It will never do xyz because of abc." Then when you find the bug you realize that not only was your code wrong but you were wrong too. It was two bugs in one.

Of course, being human we can't avoid making assumptions. Assumptions are powerful for the same reason abstraction in Object Oriented Programming is powerful. It lets you forget about the details and focus on a higher level idea. But in my limited experience I've noticed that being human also comes with a tendency to want to rush. And nothing helps you rush more than making assumptions. So now when I set out to find a bug I always remind myself to function as a Fair Witness. I'll still make incorrect assumptions from time to time, but I'll avoid making a countless number of rush-assumptions. And ultimately, I'll end up saving myself time and frustration.

That being said, making assumptions in debugging can be a good thing too. You just have to make your assumptions in a very conscious manor. As in, this side of the house is blue, so I'm going to assume the other sides are as well. Then if you run into an inconsistency later on you'll remember you made the assumption and you can go back and examine if it was the right one. The author of The Old New Thing blog calls this Psychic debugging. I call it functioning as a more flexible Fair Witness.

Wednesday, April 4, 2007

Why?!

There are a few reasons for this blag.

I like to write my opinions about software and technology like things.
There is a wide community of people on the interweb who are writing similar things. This blog might allow me to join them.
I have a website which includes a forum that I wrote in ASP over 5 years ago (as of this posting). Its been great fun, but it doesn't allow me to blog, as it were. Its too open and public. Its also hosted on a computer in my basement, which isn't terribly reliable.
I thought I should learn how this blogging software worked.

I may port things I wrote on my original website to this blog if I feel like I'm getting something out of this. If I do, I'll indicate that's where it came from along with the original date in the interest of full disclosure.

And with that...