kwblog: sql server

Showing posts with label sql server. Show all posts

Friday, March 21, 2008

Varbinary and Service Broker, a story

Today I was working on a very simple SQL 2005 Service Broker service. It was to work like this: someone executes a stored procedure passing in some meta data and a varbinary(max) which contains a document. The stored procedure inserts a record into the database with the meta data, then builds a service broker message containing the varbinary and some other required information. A windows service is monitoring the service broker queue for messages. When one arrives it parses it out and creates a file out of the bytes.

This took me about an hour to get through this morning (I already had written a nice service broker library, which is why it was so easy). However, I spent the rest of the day fighting with setup, data types, and encodings.

Problem #1: Windows installer service can't start. Error 193: 0xc1

A very helpful error message which popped up immediately after I tried to start my service. It was clear the OnStart method was never invoked. Apparently this error can be caused by a random "program" file in C:\ on occasion. In my case that wasn't the problem.

Nope, my case was much dumber than that. C# Services, like most programs, require a main method, usually in a program.cs file, to execute. I didn't have one because I started with the visual studio class library template instead of the windows service template... Once I realized what was missing this was easy to fix. But realizing it took longer than you might think.

Problem #2: Invalid XML Document (1, 256)

It turns out that when you define a SqlCommand and setup a bunch of Parameters on it and you tell the parameter that its length is 256, it's actually going to truncate anything you pass into it to 256 characters... In this case, my xml message was longer than that. Duh.

Problem #3: Invalid XML Document (1, 1)

This one is weird and I can not explain it. Say you create an XML variable in SQL and use it as your service broker message like so:

declare @msgX XML

Then in C# you define the message_body parameter as an nvarchar(max). The types are mismatched, but we know XML is just a string, so everything should be just fine, right?

Wrong. I don't know why. But the string you get out will have some random byte as the first byte. If you print it to the Windows Event Log it will look like a little box. This byte causes C#'s XML parser to blow up. If you change the SqlCommand parameter to XML(max) (so the data types match) the byte goes away...

Problem #4: Encoding the varbinary data into XML from SQL

You can't build your xml message just by concatenating strings together if you're dealing with varbinary. You can't cast the varbinary variable to a string either. What you need to do is encode the varbinary. In SQL, the easiest way to do this is:

declare @encoded xml
set @encoded = ( SELECT @varbinary FOR XML, TYPE )

This will automatically encode your binary into base 64 text.

Problem #5: Decoding the varbinary out of the XML in .NET

It took me awhile to find the right way to do this. At first I was trying to use the static Encoding class like

Encoding.Unicode.GetBytes( encodedData );

But this was just giving me the encoded string back.

Finally I found the right way to do it:

byte[] b = Convert.FromBase64String( encodedData );

Moral of the Story or What I learned:
Debugging Windows Services is hard. You have to print messages to the Event Log to figure out what is going on. And every time you make a change you have to: uninstall the service, rebuild the solution, rebuild the setup file, reinstall the service, go to computer management services, set the Run As permissions on the service, start the service, and check the event log to be sure it started correctly. THEN you can do whatever you need to test it.

This takes time. And it sucks.

So what I learned is, take the time to build a test, or refactor the code, or copy and paste the code if you have to. Even though you think, "oh! I just have to do this and it will work" take the time. Cause it's not going to work after you do that. And then you're going to have to do it all again. So just create a nice little testing environment for yourself. Its a one time expense, and then everything will go much faster from there. You'll be much happier, and if you're lucky, you wont have to spend all day debugging something that had basically been done at 11am.

Friday, December 28, 2007

SQL Server 2005 Service Broker

Service Broker is a very nice feature of SQL Server 2005. Like most things, it’s quite simple after you’ve figured it all out, but when you’re first learning, it can be quite daunting. When it comes to learning about SQL Server the main problem is that all the words Microsoft uses to describe things are so overloaded it’s easy to get confused. Everything is an "application" and everything is a "service"... This rapidly becomes confusing... "Which application are they talking about now? Their’s? Mine? What service? My service? The service broker service? That stored procedure?"

It also becomes confusing because SQL Server has so many different "technologies" built into it. A lot of them seem to do the same thing, but in fact they’re different, some more than others. For example, there is Notification Services, Service Broker, Event Notification, Internal Activation, External Activation, etc. And then it gets even more confusing because some of these "technologies" are just combinations of other "technologies." External Activation, for example, is really just Event Notification applied to Service Broker and it uses Service Broker Queues to deliver its messages.

In this post I’m going to briefly go over some of the things I’ve learned about Service Broker. If you’re completely new to Service Broker and you want a quick detailed overview with code samples you may want to go read this tutorial or browse through MSDN’s service broker articles.

So then, what is Service Broker? It is simply a queued message delivery system. A message sent from one person to another is guaranteed to not only arrive but arrive in order, and in the event that something goes wrong, not be lost. Of course there are a lot of details to describe exactly how it behaves, but that’s really all there is to understanding it.

What would you use it for? Lots of things! The uses I’m most concerned with are "Asynchronous triggers" and "Large-scale batch processing." In other words you can drop a message into a queue and then forget about it, continuing with your own work and ultimately completing. In the background, you know that your message will at some point be received and handled. In this way you can offload work from stored procedures or triggers to be performed asynchronously.

So here is the big question, if we’re sending messages from person A to person B, who or what are these people? They are anything which is capable of executing SQL commands. Thus it can be a stored procedure, a C# application using dynamic SQL, a Windows Service, anything. Notice that any of these things can be either the sender or the receiver!

Here’s a simple example. The sender can be a stored procedure which you invoke. This stored procedure will simply execute the needed commands to send a message from himself to the receiver. The receiver can also be a stored procedure, but a slightly unusual one. This stored procedure will be configured to start when SQL Server starts and it will run infinitely in a loop, periodically checking for new messages. If this is beginning to sound a lot like Socket programming to you, then you already understand Service Broker.

Where is the queue in this example? Well, actually, there are two of them. The sender has a queue and the receiver has another. Why? So far I’ve been describing this as a dialog. Person A always sends messages to B who always receives them. B never sends anything to A. But, in fact, that’s not the way Service Broker works. Service Broker always assumes a dialog in which A and B are both sending and receiving messages to and from each other. That doesn’t mean you have to carry on a dialog, in fact when I’ve used Service Broker I’ve only had a monolog. But that’s simply in the way I’ve written the applications that are using Service Broker. I didn’t do anything different in the way I setup Service Broker to make it work that way.

Back to our example, since we have two stored procedures, could we put them in two different databases? Indeed we can! In fact, that is one of the main uses of Service Broker. You could put them in two different catalogs of the same server, or you could put them in two completely different servers. Now you can effectively have two different databases talk to each other.

Having a stored procedure that is always running and monitoring the queue in a loop works just fine and there is really nothing wrong with that approach. However, Service Broker comes with another option for receiving from a queue called "Internal Activation." Basically, SQL does the looping in the background for you and when a message arrives on the queue it invokes a stored procedure. This stored procedure pulls the message off the queue and does whatever it wants to do with it. Usually the stored procedure is written to then enter a loop where it pulls messages off the queue until the queue is empty, at which point the stored procedure ends. Internal Activation will also invoke multiple instances of your stored procedure if the currently running procedures aren’t able to keep up with the number of messages arriving at the queue. In this way Internal Activation can help your message handling scale based on demand.

When I first heard about this a little red flag went up in my brain. I thought to myself, "Self! I thought messages were guaranteed to be delivered in order with Service Broker. If Service Broker is going to launch multiple "handler" stored procedures in parallel, how is it that the messages will be handled in order?" It turns out the answer lies in another Service Broker concept I haven’t mentioned yet called a conversation. Messages have to be sent on a conversation. Messages within the same conversation will arrive AND be handled in the order they were sent, guaranteed. But messages between conversations can be handled in parallel. This behavior is performed magically in the background without you having to do anything. It happens because whoever receives a message on a new conversation automatically obtains a lock on that conversation and they will only be able to receive messages from that conversation, until the conversation ends. But again, if no one had told you, you’d scarcely be aware of it at all.

If there is Internal Activation, there must be External Activation as well, right? Right. Internal Activation launches a stored procedure inside SQL Server, External Activation is for launching arbitrary programs outside SQL Server. .NET programs, C++ programs, Ruby programs, whatever. At least, that’s what you’d expect. In fact it’s kind of a misnomer. The Internal Activator launches the internal stored procedures. Internal Activation refers more to when the activator should do its work. The same is true for External Activation. The difference is that SQL Server 2005 doesn’t come with an External Activator. So you have to write your own. You can do this, because SQL Server 2005 does have External Activation, which tells you when a Queue has been activated.

External Activation is performed with SQL Server Event Notification. In the parlance, you’d write an external application which receives the external activation event (called the Queue Activation event). Okay... So how does the external app "receive an event?" Through Service Broker! That’s right, you create another Service Broker Queue and the Queue Activation event is placed in that queue as a message. Then you write an external application that monitors this new queue and creates new application processes which behave just like the activated stored procedures from Internal Activation. I’m not going to go into more details of how this works. It is a bit more complicated of course. However, Microsoft does have a sample.

How does this external application monitor the Service Broker queue? Exactly the same way as our first example stored procedure did. It just sits there in an infinite loop waiting for messages to arrive on the Queue. In other words, it just executes a SQL statement inside a loop.

So, what’s the difference between using an External Activator and just writing an application that monitors the actual queue in question (not the "queue activation" queue)? The only difference is that the External Activator will spawn off multiple handlers applications and therefore scale better with demand. Just writing a monitoring application will only handle one message at a time (unless you get fancy, which could get you into trouble in some circumstances due to conversation queuing as mentioned earlier). However, there are circumstances where this is all you need. In fact, all the ways I’ve used Service Broker to date have worked this way.

This is partly because all my Service Broker "applications" have been monologs and the processing they have performed wouldn’t really go any faster if it were done in parallel. In fact, I have never used more than one conversation. So far, I always create a single conversation. I cache the conversation handle in a database table so I can reuse it whenever I send a message. This is because most of the performance impact when using Service Broker comes from creating a conversation. That makes sense since the conversation is actually the unit that has guaranteed in order message delivery. By caching the conversation and never ending it or starting a new one, you improve the performance. This is discussed in an article called Reusing Conversations. You have to consider some scalability concerns before deciding to use exactly one conversation. For me however, I felt that was good enough (not to mention simple).

For me, the receiving application is a Windows Service written in C#. It creates a background thread that loops indefinitely issuing a WAITFOR( RECEIVE ... ) command to SQL Server with a half second or so timeout. This setup allows me to send messages serially and receive and handle them serially as well. If I needed to send messages in parallel I’d have to use more than one conversation. I believe I could still receive the messages serially, but I’d have to be very careful with when the conversation lock was released, as discussed in this msdn article, Conversation Group Locks.

Hopefully this summary has shown how flexible and powerful Service Broker is. And also how easy it is, despite the fact that the learning curve is pretty high and the setup curve is pretty high too (you have to create at least 6 different new database objects at the bare minimum to get a service broker conversation going). With luck SQL Server 2008 wont deprecate all this information.

Wednesday, May 9, 2007

Blob performance concerns

Maybe someone can help me answer this question. I originally wrote about this on 3/2/2006 but never really found a good answer.

Suppose you are writing a program in C# that uses SQL Server 2005 and you'd like to store some files as Blobs (Binary Large OBjectS) in the database.

This is a very simple thing to do. Take the bytes, send them to SQL Server 2005 where you store them in a varbinary(max) column.

The code for this in C# looks like:

FileStream fs = new FileStream( "file" );
byte[] buff = new byte[fs.Length];
fs.Read( buff, 0, fs.Length );

SqlCommand sc = new SqlCommand(...);
sc.Parameters.Add( ...buff... );

My concern is that this code is going to read the entire file into memory. Then its going to send all that memory to SQL server in one shot. If this is a sufficiently large file I'd expect there to be problems on both the client and the server because of this.

If the client is somewhat lacking for memory, or if the file is really large, then I'd expect this code to cause a lot of disk paging. Possibly even paging out the beginning of the file you just read from disk to make room for the end of the file, just to turn around and page out the end of the file to make room for the beginning so it can send those bytes to SQL. I'm not clear such a thing could ever actually happen, but I do suspect some paging will occur.

What I'd really like to do is specify a stream as the parameter instead of a byte array. That way I could trust the driver was only loading a portion of the full file into a buffer at a time. It seems the JDBC driver supports this, but the SqlClient driver for ADO.NET doesn't.

So my questions are, am I blowing this way out of proportion? Is there a better way to insert a Blob in SQL from C#? Why does no one else on the internet address this concern?

Partly because of this problem, along with some other performance concerns, I ended up storing the files on a shared file server instead of as Blobs in SQL. This allowed to me to handle the transfer myself using the typical buffered approach.