DirectCompute

Fri, February 19, 2010, 04:00 PM under ParallelComputing | GPGPU
In my previous blog post I introduced the concept of GPGPU ending with:
On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the Direct X API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as "kernel".
In this post I want to share some links to get you started with DirectCompute and HLSL.

1. Watch the recording of the PDC 09 session: DirectX11 DirectCompute.

2. If session recordings is your thing there are two more on DirectCompute from nvidia's GTC09 conference 1015 (pdf, mp4) and 1411 (mp4 plus the presenter's written version of the session).

3. Over at gamedev there is an old Compute Shader tutorial. At the same site, there is a 3-part blog post on Compute Shader: Introduction, Resources and Addressing.

4. From PDC, you can also download the DirectCompute Hands On Lab.

5. When you are ready to get your hands even dirtier, download the latest Windows DirectX SDK (at the time of writing the latest is dated Feb 2010).

6. Within the SDK you'll find a Compute Shader Overview and samples such as: Basic, Sort, OIT, NBodyGravity, HDR Tone Mapping.

7. Talking of DX11/DirectCompute samples, there are also a couple of good ones on this URL.

8. The documentation of the various APIs is available online. Here are just some good (but far from complete) taster entry points into that: numthreads, SV_DispatchThreadID, SV_GroupThreadID, SV_GroupID, SV_GroupIndex, D3D11CreateDevice, D3DX11CompileFromFile, CreateComputeShader, Dispatch, D3D11_BIND_FLAG, GSSetShader.

GPGPU

Fri, February 19, 2010, 03:58 PM under ParallelComputing | GPGPU
What
GPU obviously stands for Graphics Processing Unit (the silicon powering the display you are using to read this blog post). The extra GP in front of that stands for General Purpose computing.

So, altogether GPGPU refers to computing we can perform on GPU for purposes beyond just drawing on the screen. In effect, we can use a GPGPU a bit like we already use a CPU: to perform some calculation (that doesn’t have to have any visual element to it). The attraction is that a GPGPU can be orders of magnitude faster than a CPU.

Why
When I was at the SuperComputing conference in Portland last November, GPGPUs were all the rage. A quick online search reveals many articles introducing the GPGPU topic. I'll just share 3 here: pcper (ignoring all pages except the first, it is a good consumer perspective), gizmodo (nice take using mostly layman terms) and vizworld (answering the question on "what's the big deal").

The GPGPU programming paradigm (from a high level) is simple: in your CPU program you define functions (aka kernels) that take some input, can perform the costly operation and return the output. The kernels are the things that execute on the GPGPU leveraging its power (and hence execute faster than what they could on the CPU) while the host CPU program waits for the results or asynchronously performs other tasks.

However, GPGPUs have different characteristics to CPUs which means they are suitable only for certain classes of problem (i.e. data parallel algorithms) and not for others (e.g. algorithms with branching or recursion or other complex flow control). You also pay a high cost for transferring the input data from the CPU to the GPU (and vice versa the results back to the CPU), so the computation itself has to be long enough to justify the overhead transfer costs. If your problem space fits the criteria then you probably want to check out this technology.

How
So where can you get a graphics card to start playing with all this? At the time of writing, the two main vendors ATI (owned by AMD) and NVIDIA are the obvious players in this industry. You can read about GPGPU on this AMD page and also on this NVIDIA page. NVIDIA's website also has a free chapter on the topic from the "GPU Gems" book: A Toolkit for Computation on GPUs.

If you followed the links above, then you've already come across some of the choices of programming models that are available today. Essentially, AMD is offering their ATI Stream technology accessible via a language they call Brook+; NVIDIA offers their CUDA platform which is accessible from CUDA C. Choosing either of those locks you into the GPU vendor and hence your code cannot run on systems with cards from the other vendor (e.g. imagine if your CPU code would run on Intel chips but not AMD chips). Having said that, both vendors plan to support a new emerging standard called OpenCL, which theoretically means your kernels can execute on any GPU that supports it. To learn more about all of these there is a website: gpgpu.org. The caveat about that site is that (currently) it completely ignores the Microsoft approach, which I touch on next.

On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the DirectX API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels that run on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as a "kernel". For a comprehensive collection of links about this (including tutorials, videos and samples) please see my blog post: DirectCompute.

Note that there are many efforts to build even higher level languages on top of DirectX that aim to expose GPGPU programming to a wider audience by making it as easy as today's mainstream programming models. I'll mention here just two of those efforts: Accelerator from MSR and Brahma by Ananth.

Code for Parallelism Features Tour

Fri, February 5, 2010, 10:54 PM under ParallelComputing
Last year I linked to a screencast that shows off many VS2010 features delivered by the Parallel Computing team.

There have been requests for the code used to demonstrate the features. Like with all my screencasts, you can see all the code in action, so you could simply type it in. To save you doing that though, you may download the two files with the demo code here: MM.cs and Program.cs. HTH.