Introducing the new Task type

Tue, December 30, 2008, 06:46 AM under ParallelComputing
In a previous post I made the point about the need to finely partition our compute bound operations and enumerated the benefits of fine grained parallelism. In another post I showed how it is a mistake to directly use Threads to achieve fine grained parallelism. The problem was that the unit of partitioning in our user mode app was also the unit of scheduling of the OS.

System.Threading.Tasks.Task
We are introducing in mscorlib of .NET 4 the System.Threading.Tasks.Task type that represents a lightweight unit of work. The code from my previous post would look like this with Tasks (and it does not suffer from any of the 3 problems that the original code suffers from):
static void WalkTree(Tree tree) 
{
if (tree == null) return;
Task left = new Task((o) => WalkTree(tree.Left));
left.Start();
Task righ = new Task((o) => WalkTree(tree.Righ));
righ.Start();
left.Wait();
righ.Wait();
ProcessItem(tree.Data);
}
Tasks run on the new improved CLR 4 ThreadPool engine – I will not repeat here in this post the performance and load balancing benefits, but will instead focus on the rich API itself.

Creation and Scheduling
An example of the API is what we saw above where we used the Task with the same pattern that we use threads (create and then later start). You can see another example of the creation API if we modify the original Main method to look like this:
static void Main() 
{
Tree tr = Tree.CreateSomeTree(9, 1);
Stopwatch sw = Stopwatch.StartNew();
Task t =Task.StartNew(delegate { WalkTree(tr); });
t.Wait();
Console.WriteLine("Elapsed= " + sw.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
Notice how we can create Tasks and start them with a single statement (StartNew), which is similar to how we use the ThreadPool.QueueUserWorkItem with the added benefit of having the reference to the work in the form of the variable 't'.

Waiting
Also notice above how we preserve the semantics of the code prior to the change by waiting for the work to complete before the Console.WriteLine statement. We saw this method further above in the method WalkTree. In fact in WalkTree, we can change the two calls (left.Wait and righ.Wait) with the more flexible Task.WaitAll(left, right) and there are other options such as a WaitAny method that would block only until any one of the tasks you pass into it complete.

Continuations
We can further change the body of the Main method as follows:
Tree tr = Tree.CreateSomeTree(9, 1);   
Stopwatch sw = Stopwatch.StartNew();
Task t = Task.StartNew(delegate{ WalkTree(tr);});
t.ContinueWith(tt => Console.WriteLine("Done"), TaskContinuationKind.OnAny);
t.Wait(2500);
Console.WriteLine("Elapsed= " + sw.ElapsedMilliseconds.ToString());
Notice how we are waiting with a timeout this time which means that after 2.5 seconds we will see on the console "Elapsed..." (given that our WalkTree work takes longer than that to complete). However, at that point the CPU usage will remain at 100% as our work is still being executed. When it completes, as the CPU usage drops down again, we will also see in the console "Done". This should verify your expectation of the self explanatory ContinueWith method. It is a very powerful method (more here) that enables patterns such as pipelining. You can have many continuations off the same task and you can configure the circumstances under which to continue via the TaskContinuationKind that I encourage you to explore along with the various overloads.

Cancellation
Cancellation is well integrated in the API. Cancelling a task that is scheduled in some queue and has not executed yet means that it will not be executed at all. For a task that is already running, cooperation is needed which means that the task can check a boolean property (IsCancellationRequested) to see if cancellation was requested and act accordingly. Finally, you can see if a task is actually cancelled via another boolean property (IsCanceled) on the Task type. If we modify the 2 lines of code above as follows:
    t.ContinueWith(tt => Console.WriteLine("done"));
t.Wait(2500);
t.Cancel();
...we will see the "Elapsed" message followed immediately by a drop in CPU utilization and the "Done" message.
Note that for the cancelation above to behave as expected, we are assuming that when we cancel a Task, all tasks created in that scope also get cancelled, i.e. when we cancel 't' all the tasks created in WalkTree also get cancelled. This is not the default, but we can easily configure it as such by changing the ctor call in WalkTree for both left and right to be as follows:
...= new Task((o) => WalkTree(tree.Left), TaskCreationOptions.RespectParentCancellation);

Parent Child Relationships
The above correctly implies that there is a parent child relationship between tasks that are created in the scope of an executing task. It is worth noting that parent tasks implicitly wait for their children to complete which is why the waiting worked as expected further above. If we wanted to opt out of that we can create detached children via the TaskCreationOptions.Detached option. I encourage you to experiment with the other TaskCreationOptions...

Task with Result
Let's go way back and peek at the original serial implementation of WalkTree and let's modify it so it actually returns a result:
static int WalkTree(Tree tree) 
{
if (tree == null) return 0;
int left = WalkTree(tree.Left);
int righ = WalkTree(tree.Righ);
return ProcessItem(tree.Data) + left + righ;
}
...as we ponder the question of "How do we parallelize that?" take look again at the code we have at the top of this post that parallelized the version that did not return results.
We can change it to return 0 when there are no more leaf nodes and change it to return the results of ProcessItem, but we have an issue with how to obtain the results of the WalkTree(righ) and WalkTree(left) and add them to our return results. In other words: we are passing a delegate to the Task ctor that returns a result and we need a way to store it somewhere. The obvious place to store it is the Task itself! However, we want this strongly typed so we use generics and we have type that inherits from Task which is Task<T> (in the CTP bits it is called a Future<T>). This new type has a property for returning the Value and the call will block if the task is still executing or it will return immediately if it has executed and the value is already stored. So the code can be modified as follows:
static int WalkTree(Tree tree) 
{
if (tree == null) return 0;
Task<int> left = new Task<int>((o) => WalkTree(tree.Left), TaskCreationOptions.RespectParentCanellation);
left.Start();
Task<int> righ = new Task<int> ((o) => WalkTree(tree.Righ) , TaskCreationOptions.RespectParentCanellation);
righ.Start();
return ProcessItem(tree.Data) + left.Value + righ.Value;
}
Note that if we did not want to block on Value then we could have queried the IsCompleted property of the Task.

In Summary
Above I have given you a brief glimpse of the rich API that Task exposes (and there is a lot more such as a nice exception handling model that aggregates exceptions thrown in parallel into a single AggregateException). Combined with my other posts referenced above, you should feel comfortable (if not compelled) to use this new Task API in all parallelism scenarios where previously you considered using directly Threads or the ThreadPool. Furthermore, the rich API has hopefully enticed you to use it even if you had not considered the ThreadPool or threads before.
Tuesday, 30 December 2008 07:45:00 (Pacific Standard Time, UTC-08:00)
Why was the decision made to get rid of the name 'Future'? A task with a result and a void (unit?) task is still a concept count of two. So how does 'unifying' concepts on the number of generic parameters (0 or 1) make things simpler?
Tuesday, 30 December 2008 12:46:00 (Pacific Standard Time, UTC-08:00)
I had to chuckle that it was "righ" everywhere in this post instead of "right".

Anyway, I'm curious why we have only Task now as well. In the BCL, Func and Action are two separate concpets, so it seems natural that Task and Future would be separate as well.
Tuesday, 30 December 2008 14:41:00 (Pacific Standard Time, UTC-08:00)
Hi Daniel,

In your first example you create a task with an Action`T (using a lambda with a single "o" parameter).

In the second example you use Task.StartNew and pass it a delegate that takes no parameters.

Are there different overloads for creating a new Task? Perhaps one that takes Action`T and one that only takes Action? If I create a new Task using Action`T, does its Start method have an overload I can pass some payload data into?

Thanks for the informative post!

Matt
Tuesday, 30 December 2008 18:02:00 (Pacific Standard Time, UTC-08:00)
In the examples where Task returned a value, you used the "new Task<int>" syntax but in the non-returning tasks you used a more type inference friendly syntax "Task.Create". Will there be an equivalent type inference friendly Task.Create for value returning Tasks?

I'm assuming yes but wanted to check.

Nitpick Corner: The last example of WalkTree specifies a void return type but returns a value.
Wednesday, 31 December 2008 00:03:43 (Pacific Standard Time, UTC-08:00)
Tom: I will refrain from giving you my reasoning (although I will say that I cheered when the change was made ;), but will come back to you with an official answer once I get one – it was not my personal decision.

wekempf: "Chuckle" is exactly the reaction that righ is aimed to provoke. It succeeded when I delivered this code live in conference sessions, and I am glad it worked in writing as well :-D

Matt: Good eyes :). Yes there are overloads for passing state or leaving it out. Of course you are not forced to use a delegate/lambda with an argument if you are not passing something in. I incorrectly used this in my example because of a limitation of the CTP bits which is what I originally wrote this code against. It has no side effects, but does raise the question you posed. Thanks for spotting it.

Jared: Good nitpick catch – fixed. Yes, there is a way to start and schedule a Task with results in one shot, just like you can with Tasks that don’t return results. The approach I used in WalkTree is consistent for both scenarios. I only used the StartNew in Main and I never introduced Futures there so I didn't get to show it. It is available in the September CTP, but we have refactored that area in M3 so there is little point going into it here ;)
Saturday, 03 January 2009 17:51:00 (Pacific Standard Time, UTC-08:00)
Tom, we're opting away from the Future naming for a variety of reasons. For one, we continually found ourselves describing futures as "tasks that return results"; the name "future" wasn't enough to evoke the necessary understanding, whereas Task<TResult> is much clearer to folks we've spoken with about it, in that the type name itself communicates much more about the type's purpose. There is also some discrepancy in the literature and concurrency circles about exactly what's implied by the name "future," what functionality should be exposed (and, more importantly, what functionality should not be exposed), what are the concurrency guarantees made by it (e.g. does a future require its body to be side-effect free), and so forth... it's helpful for us to avoid those issues by choosing a name that doesn't carry the same baggage. There's also the fact of Future deriving from Task. All of Task's instance methods are available to Future, and its static methods like Task.WaitAll/WaitAny are available as well; it makes more sense then from an API perspective to pass tasks into the Task.WaitAll method than to pass both tasks and futures into such a method: in other words, naming it "Future" has more of a not-related-to-Task implication. Also, in just referring to these offhand, it's easier to speak of both Task and Task<TResult> in terms of "tasks," rather than having to call them "tasks and futures" everywhere. This shows up as well in some of the debugger support that Daniel focuses on, e.g. the Parallel Tasks window shows tasks in your application, e.g. Task and Task<TResult>... it'd be odd if the Parallel Tasks window showed Future instances. And so on. All of those reasons are of course mostly subjective, but we felt they were strong enough to warrant a change. There are also positive implications for the change from systems like IntelliSense.

wekempf, it's true that Func and Action are two separate types, but it's also not the cleanest analogy. For one thing, there are many versions of these types with different generic arities, e.g. Action, Action<T1>, Action<T1,T2>, etc., and Func<TResult>, Func<T1,T2,TResult>, etc. Moreover, given a delegate Xyz<T1,T2,T3>, do you know whether T1,T2,T3 are for parameters to the delegate or the delegate's return value? If Action and Func both had the same name, i.e. Xyz, you couldn't have both Xyz<T1> and Xyz<TResult>, as they'd conflict. That issue doesn't exist with Task and Task<TResult>.

I hope that helps. Feedback is, as usual, very welcome. Thanks for the questions.
Friday, 16 January 2009 08:13:00 (Pacific Standard Time, UTC-08:00)
Hi Stephen,

Many thanks for the details behind the decision to unify Task and Future. Having read and thought about it some more I think you chaps are right. Looking forward to getting my paws on VS2KA Beta 1 so I can install it on Win '7 and start playing with the tweaked types.

Kind regards,

tom
Friday, 13 February 2009 08:00:00 (Pacific Standard Time, UTC-08:00)
Just curious when the Future concept will be phased out - I've got the latest FX library now and Futures are still there.
Friday, 13 February 2009 08:03:53 (Pacific Standard Time, UTC-08:00)
Hey Alex, to be clear, the concept is not going anywhere: we are just renaming it. The change will be publically available with the next public drop of Visual Studio 2010.
Comments are closed.