Mort, Elvis, Einstein – WCF and Async

The “classic” classification problem has been around for quite while in Microsoft (and every where else) and this reference is something that you would hear quite often. You can find Nikhil Kothari’s post about Applying personas where he uses this analogy to describe types of programmers and another interesting post too http://www.codinghorror.com/blog/archives/001004.html. The intent here is not to describe this, those other links will help for that, but to call out how sometimes some features are buried deep in the guts of WCF. Well I maybe exaggerating a bit here. Anyway, one of the challenges of being a performance developer on this team is that I rarely get to deal with simplistic scenarios.Now getting back to WCF. I have grown to admire it for its beauty of how the elements tie up so elegantly and the most scary part is that its elegance is what makes it so performant. Form follows function. One of the basic principles is that the runtime is fully async or so to speak tries to be fully async. This is primarily to avoid any type of blocking obviously. There is a the trade off when you want to make something async besides just the abysmal complexities of a series of Begins and Ends. You can end up in another critical problem, besides the overhead of a thread switch. This is the basic stack unwind time on the new thread and the overhead to enable this. What does that even mean?Now think about putting a set of calls as follows on the same thread.A => B => C => D => E (execute and return) Stack unwind id pretty straight forward right and you just need to trace back and POP. You have context/locals members. You name it and the calling method has it. So the call back just happens likeA <=  B  <= C <= D <= E  (E is the starting point here)So the call tree looks something like thisA |—B    |—C        |—D            |—EWell in an async call the issue is that it the initiating threads just ends its logical execution at E.BeginA => BeginB => BeginC => BeginD => BeginE  and returnsSadly all other logical pieces of execution have to also stay in limbo till the inner begin invocation completes and that means that they have to stash their states onto something like an async result that the caller has to give back to them during execution resumption. So in short the AsyncResult is nothing but like a glorified stack pointer.To duplicate what stack unwind does we end up creating a bunch of results that holds the data for execution and the callback path is a bit different on the thread and would be as follow.EndA <= EndB <= EndC <= EndD <= EndE (thread starts here actally but logical execution is the reverse)Here the call tree is not so simple. You have the initial one that looks like this.BeginA |—BeginB    |—BeginC        |—BeginD            |—BeginEAnd when the end happens  on the other completion thread as shown below.EndE|—EndD    |—EndC        |—EndB            |—EndAWell so you end up paying double the stack unwind cost for an async call with the condition that you do incur a thread switch and true do go async. Note that this is not applicable to calls that complete synchronously cause they obey the regular unwind rules. Now if you are still wondering why go async at all then its time to hit the books cause we are waiting on “IO” and we if we hold on to a thread when we do IO its like buying a jumbo jet to get free peanuts. Again exaggeration. But the point is IO is way more expensive than a stack unwind so its better to free up our precious CPU cycles for other hungry requests.Just as a conclusion. The async nature is not that simple to produce but has such returns for your investment that it just makes sense to go that route not only for frameworks like WCF but also other apps.In my next post i’ll try to explain about a knob added in WCF, quite counter intuitive to basic understandings of async and IO and a part of WCF that’s not purely async per say. I was told by Wenlong to create this Elvis knob.

Update: Sept-05-09 – Concurrent receives.

Here is the knob that we added for enabling multiple receives Concurrent Receives – MaxPendingReceives