Dryad and DryadLINQ from MSR

Sat, November 13, 2010, 07:06 PM under HPC | ParallelComputing

Microsoft Research (MSR) researches technologies, incubates projects which many times result in technology that looks like a ready-to-use product (but it is important to understand that these are not the same as products built by the various… actual product teams here at Microsoft).

A very popular MSR project has been DryadLINQ, which itself builds on Dryad. To learn more follow the project pages I just linked to and I also recommend this 1-hour channel 9 video. If you only have 3 minutes, watch this great elevator pitch instead. You can also stay tuned on the official blog, which includes a post that refers to internal adoption e.g by Bing, a quick DryadLINQ code example, and some history on how DryadLINQ generalizes the MapReduce pattern and makes it accessible to regular programmers (see this post and that post).

Essentially, the DryadLINQ framework (building on the Dryad runtime) allows developers to re-use their LINQ skills for creating/generating programs that process large multi-gigabyte/terabyte datasets across 100s-1000s of machines. One way to think about it is that just as Parallel LINQ allows LINQ developers to seamlessly use multiple cores from a single process on a single machine, DryadLINQ allows LINQ developers to seamlessly use multiple machines for their data parallel algorithms. In the former scenario the motivation was speed of execution, in the latter it is speed of execution AND processing large datasets that simply don't fit on a single machine.

Whenever I hear about execution of parallel code on multiple machines on the Microsoft platform, I immediately think of Windows HPC Server. Indeed Dryad and DryadLINQ were made available for Windows HPC Server and I encourage you to watch the PDC session on this topic: Data-Intensive Computing on Windows HPC Server with the DryadLINQ Framework.

Watch this space…