"Hello World" in C++ AMP

Sun, June 26, 2011, 06:02 PM under GPGPU | ParallelComputing

UPDATE: I encourage you to visit a newer and better post with a C++ AMP matrix multiplication example.

Some say that the equivalent of "hello world" code in the data parallel world is matrix multiplication :)

Below is the before C++ AMP and after C++ AMP code. For more on what it all means, watch the recording of my C++ AMP introduction (the example below is part of the session).

    void MatrixMultiply(vector<float>& vC, 
			    const vector<float>& vA,
			    const vector<float>& vB, 
			    int M, int N, int W )
        for (int y = 0; y < M; y++) 
            for (int x = 0; x < N; x++) 
                float sum = 0;
                for(int i = 0; i < W; i++)
                    sum += vA[y * W + i] * vB[i * N + x];
                vC[y * N + x] = sum;
Change the function to use C++ AMP and hence offload the computation to the GPU, and now the calling code (which I am not showing) needs no changes and the overall operation gives you really nice speed up for large datasets… 
    #include <amp.h>
    using namespace concurrency;

    void MatrixMultiply(vector<float>& vC, 
			    const vector<float>& vA,
			    const vector<float>& vB, 
			    int M, int N, int W )
        array_view<const float,2>      a(M, W, vA);
        array_view<const float,2>      b(W, N, vB);
        array_view<writeonly<float>,2> c(M, N, vC); 

            [=](index<2> idx) mutable restrict(direct3d) 
                float sum = 0;
                for(int i = 0; i < a.x; i++) 
                    sum += a(idx.y, i) * b(i, idx.x);
                c[idx] = sum;

Again, you can understand the elements above, by using my C++ AMP presentation slides and recording

Stay tuned for more…

Links to C++ documentation

Wed, June 22, 2011, 05:30 PM under C++

After a recent talk I gave on C++ AMP, one attendee was complaining that they were not familiar with lambdas and another found templates hard to parse. In case you are in the same boat, I thought I'd gather some essential reading material for you (also gives me one link to use in the future for referring people to ;-)

Lambdas are available (in some shape or form) in all modern languages, so do yourself a favor and learn about them:

Templates, have been around in modern languages for even longer than lambdas (e.g. Generics in .NET), so again go dive in:

In fact, why don't you refresh your knowledge and read the entire msdn C++ Language Reference – that's what I am doing!

If you are looking to keep up to date with what is happening in the C++ world, stay tuned on the Visual C++ team (aka WinC++ team) blog and ask questions in the C++ forums.

C++ AMP recording and slides

Fri, June 17, 2011, 02:51 PM under Events | GPGPU | ParallelComputing

Yesterday we announced C++ Accelerated Massive Parallelism.

Many of you want to know more about the API instead of just meta information. I will trickle more code over the coming months leading up to the date when we will share actual bits. Until you have bits in your hand, it is only your curiosity that is blocked, so I ask you to be patient with that and allow me to release this on our own schedule ;-)

You can now watch my 45-minute session introducing C++ AMP on channel9. You will also want to download the slides (pdf), because they are not readable in the recording.

C++ Accelerated Massive Parallelism

Wed, June 15, 2011, 09:16 AM under GPGPU | ParallelComputing

At AMD's Fusion conference Herb Sutter announced in his keynote session a technology that our team has been working on that we call C++ Accelerated Massive Parallelism (C++ AMP) and during the keynote I showed a brief demo of an app built with our technology. After the keynote, I go deeper into the technology in my breakout session. If you read both those abstracts, you'll get some information about what C++ AMP is, without being too explicit since we published the abstracts before the technology was announced.

You can find the official online announcement at Soma's blog post.

Here, I just wanted to capture the key points about C++ AMP that can serve as an introduction and an FAQ. So, in no particular order…


  1. lowers the barrier to entry for heterogeneous hardware programmability and brings performance to the mainstream, without sacrificing developer productivity or solution portability.
  2. is designed not only to help you address today's massively parallel hardware (i.e. GPUs and APUs), but it also future proofs your code investments with a forward looking design.
  3. is part of Visual C++. You don't need to use a different compiler or learn different syntax.
  4. is modern C++. Not C or some other derivative.
  5. is integrated and supported fully in Visual Studio 11. Editing, building, debugging, profiling and all the other goodness of Visual Studio work well with C++ AMP.
  6. provides an STL-like library as part of the existing concurrency namespace and delivered in the new amp.h header file.
  7. makes it extremely easy to work with large multi-dimensional data on heterogeneous hardware; in a manner that exposes parallelization.
  8. introduces only one core C++ language extension.
  9. builds on DirectX (and DirectCompute in particular) which offers a great hardware abstraction layer that is ubiquitous and reliable. The architecture is such, that this point can be thought of as an implementation detail that does not surface to the API layer.

Stay tuned on my blog for more over the coming months where I will switch from just talking about C++ AMP to showing you how to use the API with code examples…

Speaking at AMD Fusion conference

Thu, June 9, 2011, 05:44 PM under Events | ParallelComputing

UPDATE: C++ AMP session recording and slides now available.

Next Wednesday at 2pm I will be presenting a session at the AMD Fusion developer summit in Bellevue, Washington State.

For more on this conference please visit the official website. If you filter the catalog by 'Speaker Last Name' to "Moth", you'll find my talk.

For your convenience, below is the title and abstract

Blazing-fast code using GPUs and more, with Microsoft Visual C++

To get full performance out of mainstream hardware, high-performance code needs to harness, not only multi-core CPUs, but also GPUs (whether discrete cards or integrated in the processor) and other compute accelerators to achieve orders-of-magnitude speed-up for data parallel algorithms. How can you as a C++ developer fully utilize all that heterogeneous hardware from your Visual Studio environment? How can your code benefit from this tremendous performance boost without sacrificing your developer productivity or the portability of your solution? The answers will be presented in this session that introduces a new technology from Microsoft.

Hope to see many of you there!