PLINQ

Sun, January 25, 2009, 06:59 PM under ParallelComputing | LINQ
With VS2008 (more specifically .NET Framework 3.5) a wonderful thing was introduced: LINQ. Given the declarative nature of Language Integrated Query it was a prime candidate for trying to inject automatic parallelization in it (i.e. to run faster by seamlessly taking advantage of multiple cores). The result of those efforts is what I mentioned 18 months ago (Parallel LINQ) and followed with a screencast 4 months later: see the 2nd link in the list here. In this post I'll do a written overview based on the latest bits. Before we continue, you should understand that PLINQ applies only to LINQ to Objects (i.e. IEnumerable-based sources where lambdas are bound to delegates, not IQueryable-based sources where the lambdas are bound to expressions). It also does not interfere with the deferred execution principles of LINQ, of course.

PLINQ as a black box
PLINQ is really simple if you want to treat it as a black box; all you do as a user is add the .AsParallel extension method to the source of you LINQ query and you are done! The following query
var result =
from x in source
where [some condition]
select [something]
...can be parallelized as follows:
var result =
from x in source.AsParallel()
where [some condition]
select [something]
Notice that the only difference is the AsParallel method call appended to the source and we can of course use this pattern with more complex queries.

Why Does It Work
To understand why the above compiles we have to remind ourselves of how LINQ works and that the first version of the code above is really equivalent to:
var result =  source.Where(x => [some condition]).Select(x => [something]);
...so when we parallelize it we are simply changing it to be the following:
var result =  source.AsParallel().Where(x => [some condition]).Select(x => [something]);
In other words the call to AsParallel returns something that also has the typical extension methods of LINQ (e.g. Where, Select and the other 100+ methods). However, with LINQ these methods live in the static System.Linq.Enumerable class whereas with PLINQ they live in the System.Linq.ParallelEnumerable class. How did we transition from one to the other? Well, AsParallel is itself an extension method on IEnumerable types and all it does is a "smart" cast of the source (the IEnumerable) to a new type which means the extension methods of this new type are picked up (instead of the ones directly on IEnumerable). In other words, by inserting the AsParallel method call, we are swapping out one implementation (Enumerable) for another (ParallelEnumerable). And that is why the code compiles fine when we insert the AsParallel method. For a more precise understanding, in the VS editor simply right click on AsParallel, choose Go To Definition and follow your nose from there…

How Does It Work
OK, so we can see why the above compiles when we change the original sequential query with our parallelised query, which we now understand is based on the introduction of new .NET 4 types such as ParallelQuery and ParallelEnumerable – all in System.Core.dll in the System.Linq namespace. But how does the new implementation take advantage (by default when it is worth it) of all the cores on your machine? Remember our friendly task-based programming model? The implementation of the methods of the static ParallelEnumerable class uses Tasks ;-). Given that the implementation is subject to change and more importantly given that we have not shipped .NET 4 yet, I will not go into exactly how it uses the Tasks, but I leave that to your imagination (or to your decompiler-assisted exploration ;)).

Simple Demo Example
Imagine a .NET 4 Console project with a single file and 3 methods, 2 of which are:
  static void Main()
{
Stopwatch sw = Stopwatch.StartNew();
DoIt();
Console.WriteLine("Elapsed = " + sw.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
static bool IsPrime(int p)
{
int upperBound = (int)Math.Sqrt(p);
for (int i = 2; i <= upperBound; i++)
{
if (p % i == 0) return false;
}
return true;
}
…without worrying too much about the implementation details of IsPrime (I stole this method from the walkthrough you get in the VS2010 CTP). So the only question is where is the 3rd method, which clearly must be named Doit. Here you go:
  static void DoIt()
{
IEnumerable arr = Enumerable.Range(2, 4000000);
var q =
from n in arr
where IsPrime(n)
select n.ToString();
List list = q.ToList();
Console.WriteLine(list.Count.ToString());
}
Now if you run this you will notice that on your multi-core machine only 1 core gets used (e.g. 25% CPU utilization on my quad core). You'll also notice in the console the number of milliseconds it took to execute. How can you make this execute much faster (~4 times faster on my machine) by utilizing 100% of your total CPU power? Simply change one line of code in the Doit method:
from n in arr.AsParallel()
How cool is that?

Can It Do More
What the PLINQ implementation does is it partitions your source container into multiple chunks in order to operate on them in parallel. You can configure things such as the degree of parallelism, control ordering, specify buffering options, whether to run parts of the query sequentially etc. To experiment with all that, just explore the other new extension methods (e.g. AsOrdered, AsUnordered) and, finally, the new enumerations (e.g. ParallelQueryMergeOptions). I leave that experimentation to you dear reader ;)

Insert and Format Images in Your pptx

Sun, January 18, 2009, 11:54 AM under AboutPresenting
I have given quite a few technical presentations in my time and anyone that has attended one will tell you that I believe in demo-driven sessions (I have *never* given a session that had less than 50% demo time and most of them met my goal to be close to 75%+). Having said that, the few slides that a session has are important and what is equally important in my opinion is to strive for an image per slide!

If you can't find an image that conveys the message of the slide, then maybe your slide is trying to convey too much; if the image does not fit on your slide, then maybe your slide is too busy; if you can't tie an image to the message, then maybe you can insert some humorous image. So, I think of it as a quality gate for my slide: if I can’t insert an image, there is something wrong with the slide. If you don’t agree with that, then still insert an image in order to please the people that think more visually than others and also to add some color to your deck ;-)

After you have inserted an image, please use the tools offered by PowerPoint to make it aesthetically pleasing. When you select the image, a new tab appears in the PowerPoint 2007 ribbon with tons of options - explore them:


It is surprising how many times people ask me how I created a glow effect or a reflection (aka mirroring) effect etc. Depending on your personal preferences and the theme of your deck, some options work better than others, but by far my favorite and the one I start with as a default is preset 5:

Please try it now on a slide: insert an image twice and apply the preset on one and leave the other "plain/default". Can you see the difference in quality? Try it projected on a huge screen and you'll never go back…

There you have it! I shared the secret to the images in my decks ("big deal" I know, but oddly it took me some time to be comfortable sharing this nonetheless ;-)

Moth Calendar 2009

Sat, January 17, 2009, 02:57 AM under Random
When I lived in the UK I was always part of the developer community: in the early days of my career as an attendee, later as an MVP and, finally, as a Microsoft person when I joined the company.

It sounds like the community people in the UK miss my interactions as much as I do, because the other day my approval was sought for a 2009 calendar of community events where in each month there is a picture of me (sounds weird I know!). I gave my permission and Craig posted the result on his blog.

Besides 12 photos of my ugly mug accompanied by (what they think are) funny captions, each page has details of the UK community events taking place that month (I suspect that is the main purpose of the 2009 Community Calendar ;-)

Windows 7 and Server 2008 R2

Fri, January 9, 2009, 12:01 AM under Windows
The Beta of Windows 7 and Windows Server 2008 R2 are available to download. This is the release which supports up to 256 cores, and you can see a screenshot of a machine running the OS with 96 cores (!) on Mike's blog.

Parallelising Loops in .NET 4

Wed, January 7, 2009, 05:58 AM under ParallelComputing
Often the source of performance issues in our code is loops, e.g. while, for, foreach. With .NET 4 it becomes easy to make such code perform better by taking advantage of multiple cores.

Parallel.ForEach
For example, given:
IEnumerable<string> arr = ...
foreach (string item in arr){
// Do something
}
, we can parallelise it is as follows:
Parallel.ForEach<string>(arr, delegate (string item){
// Do something
});
, or the tidier directly equivalent version (dropping the superfluous generic which can be inferred and turning the anonymous method syntax into a lambda statement)
Parallel.ForEach (arr, (string item) =>{
// Do something
});

Visual Distinctions
Notice the obvious visual similarities that make it almost automatic to parallelise a loop: the only difference in the parallel version is the modification in the first line (rearranging the "arr" and "string item", which are the real pieces of information) and the fact that after the closing brace at the end there is a closing parenthesis and semicolon. The crucial visual observation here is that the body of the loop remains intact.

Why Does It Work
Let's drill into why the modified code compiles and why it is equivalent in intent (even if it is obvious to some). We turned a block of code into a library method call. The .NET 4 (mscorlib) library offers the static class Parallel that (among others) offer the ForEach method. One of its overloads (its simplest) accepts 2 parameters: a source IEnumerable of TSource and a body of code (in the form of the Action of TSource delegate, of course) that accepts a single parameter which is also of TSource, of course. The method will take the body and call it once for each element in the source. If you reread the last 2 sentences you'll find that is exactly what the original loop construct does as well. The real difference here is that the original runs serially (using only a single core) while the modified runs in parallel (using, by default, all cores).

How Does It Work
Those of you that don't like black magic boxes will ask: what does that method actually do inside in order to run things in parallel? My answer: what do you think it needs to do? Remember our friendly task-based programming model? The implementation of the methods of the static Parallel class uses Tasks (and specifically SelfReplicating tasks). Given that the implementation is subject to change and more importantly given that we have not shipped .NET 4 yet, I will not go into exactly how it uses the Tasks, but I leave that to your imagination (or to your decompiler-assisted exploration ;)).

Trivial Demo Example
In a .NET 4 Console project paste the following in the Program.cs file:
  static string[] arr = Directory.GetFiles(@"C:\Users\Public\Pictures\Sample Pictures", "*.jpg");
static void SimulateProcessing() {
Thread.SpinWait(100000000);
}
static string TID {
get {
return " TID = " + Thread.CurrentThread. ManagedThreadId.ToString();
}
}
Now in the empty Main function paste the following:
    foreach (string ip in arr) {
Program.SimulateProcessing();
Console.WriteLine(ip + TID);
}
Console.ReadLine();
Run it and notice how long it takes as well as that only one thread gets used of course and in Task Manager notice the CPU usage. Now change the loop construct so they are as follows:
Parallel.ForEach(arr, (string ip) => {
Program.SimulateProcessing();
Console.WriteLine(ip + TID);
});
Re-run it and notice how much faster it runs and how the number of threads equals the number of cores on your machine and in Task Manager the CPU usage being at 100%.

Why Not Do It Automatically
Many that see this technology ask "Why not automatically change all loops to run parallelised?". The answer is that you cannot blindly apply a Parallel.ForEach wherever you have a foreach loop. If the body of the loop depends on some shared state, or if each loop iteration is not independent of every other iteration, then race conditions may arise by blindly parallelising. Ultimately, it is multiple threads that execute the body in parallel so there is no room for shared state etc. The static methods of the Parallel class have no magic to deal with that – it is still down to you. If you find yourself needing synchronization in the loop body, be sure to measure the performance because locks and such in a parallelisation scenario potentially negate (or severely limit) the benefits of parallelisation. It is for these reasons that parallelising a loop is an opt-in decision today that only you can make for your code.

A related question arises of why not embedding this functionality in the language (the obvious suggestion being introducing a pfor loop construct). The answer is that having it as a library offering instead of built-in to the language allows it to be used by *all* .NET languages instead of restricting it to a few. Also, once something is embedded into the language it typically stays there forever so we take great care about such decisions e.g. it is too early to tie C# or VB to the System.Threading.Tasks namespace.

For the distant imaginary future, we are thinking about automatically parallelising bits of code if we can (with hints from the application developer with regards to purity and side-effect-free regions of code) and also embedding parallel constructs deeper into the language. Too early to know if anything will come of that...

Can It Do More
Yes! We only saw above one of the overloads of one of the methods. Parallel.ForEach has ~20 other overloads, some of them taking up to 5 arguments and all of them having a return type too; what I am trying to say is that there is much more flexibility and richness even in this simple API. I encourage you to explore the other overloads and also the other two methods on the Parallel class: For and Invoke.

Best of "The Moth" 2008

Thu, January 1, 2009, 01:25 AM under Personal
Happy New Year! Regular readers know that on this day I gather links to my own favorite blog posts of the past year (like I did in 2004, 2005, 2006 and 2007). Enjoy the 18 links below (out of the 122 blog entries I made in 2008)!

01. Visual Studio 2008
At the start of the year I completed my multi-month series on VS2008 and .NET 3.5 topics by writing a short article for TechNet and a longer one for QBS. I also recorded more screencasts on this topic including about Client App Services, Sync Services and the MAF. I linked to those 3 from the resources post of the session I performed/delivered most in 2008: Five VS2008 Smart Client Features.

02. Silverlight 2 Beta 1
After putting VS2008 behind me, I spent a lot of my time getting up to speed on Silverlight 2 and creating a (what turned out to be a very popular and highly ranked :-) session in the Beta 1 timeframe. I blogged a lot about the technology and most of my posts are linked to from this single Silverlight post.

03. Presentation Tips
Early in the year I wrote 2 posts to help you with the basics of setting up your machine for the most important part of a presentation (the demos): Setting Up the Laptop and Setting Up Visual Studio.

04. Other non-technical
This was the year I transitioned from Europe to the US, the side effects including a blog post with a list for settling in that others found useful: Getting a USA life. The transition was also to a new role joining the hordes of Microsoft people that spend a lot of time in Outlook – this inspired me to come up with some Email Rules.

05. Debugging
After settling in, I found myself living in the Visual Studio debugger quite a bit and sharing (via the blog) findings, advice and tips. For example: name your threads, 2 cool tips, make object id, debuggerdisplayattribute and, my favorite, understanding the terminology behind active and current stack frame (and current thread).

06. Parallelism
No surprise that parallelism is featured on this blog this year (as it was last year) and it should be no surprise that it will continue to be prominent here next year. My goal is to deliver shorter posts in the future, but for now you can use a cup of your favorite beverage while consuming my thoughts on: Threading vs Parallelism, Fine Grained Parallelism, Not Explicitly Using threads for Parallelism, the CLR 4 ThreadPool engine and the new Task type.

Thank you for reading, make sure you don't miss a post in 2009 by subscribing to this blog – click on the link on the left.