GPGPU

Fri, February 19, 2010, 03:58 PM under ParallelComputing | GPGPU
What
GPU obviously stands for Graphics Processing Unit (the silicon powering the display you are using to read this blog post). The extra GP in front of that stands for General Purpose computing.

So, altogether GPGPU refers to computing we can perform on GPU for purposes beyond just drawing on the screen. In effect, we can use a GPGPU a bit like we already use a CPU: to perform some calculation (that doesn’t have to have any visual element to it). The attraction is that a GPGPU can be orders of magnitude faster than a CPU.

Why
When I was at the SuperComputing conference in Portland last November, GPGPUs were all the rage. A quick online search reveals many articles introducing the GPGPU topic. I'll just share 3 here: pcper (ignoring all pages except the first, it is a good consumer perspective), gizmodo (nice take using mostly layman terms) and vizworld (answering the question on "what's the big deal").

The GPGPU programming paradigm (from a high level) is simple: in your CPU program you define functions (aka kernels) that take some input, can perform the costly operation and return the output. The kernels are the things that execute on the GPGPU leveraging its power (and hence execute faster than what they could on the CPU) while the host CPU program waits for the results or asynchronously performs other tasks.

However, GPGPUs have different characteristics to CPUs which means they are suitable only for certain classes of problem (i.e. data parallel algorithms) and not for others (e.g. algorithms with branching or recursion or other complex flow control). You also pay a high cost for transferring the input data from the CPU to the GPU (and vice versa the results back to the CPU), so the computation itself has to be long enough to justify the overhead transfer costs. If your problem space fits the criteria then you probably want to check out this technology.

How
So where can you get a graphics card to start playing with all this? At the time of writing, the two main vendors ATI (owned by AMD) and NVIDIA are the obvious players in this industry. You can read about GPGPU on this AMD page and also on this NVIDIA page. NVIDIA's website also has a free chapter on the topic from the "GPU Gems" book: A Toolkit for Computation on GPUs.

If you followed the links above, then you've already come across some of the choices of programming models that are available today. Essentially, AMD is offering their ATI Stream technology accessible via a language they call Brook+; NVIDIA offers their CUDA platform which is accessible from CUDA C. Choosing either of those locks you into the GPU vendor and hence your code cannot run on systems with cards from the other vendor (e.g. imagine if your CPU code would run on Intel chips but not AMD chips). Having said that, both vendors plan to support a new emerging standard called OpenCL, which theoretically means your kernels can execute on any GPU that supports it. To learn more about all of these there is a website: gpgpu.org. The caveat about that site is that (currently) it completely ignores the Microsoft approach, which I touch on next.

On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the DirectX API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels that run on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as a "kernel". For a comprehensive collection of links about this (including tutorials, videos and samples) please see my blog post: DirectCompute.

Note that there are many efforts to build even higher level languages on top of DirectX that aim to expose GPGPU programming to a wider audience by making it as easy as today's mainstream programming models. I'll mention here just two of those efforts: Accelerator from MSR and Brahma by Ananth.