The Moth

« Links to C++ documentation | ScrollViewer.EnsureVisible for Windows P... »

"Hello World" in C++ AMP

Sun, June 26, 2011, 06:02 PM under GPGPU | ParallelComputing

UPDATE: I encourage you to visit a newer and better post with a C++ AMP matrix multiplication example.

Some say that the equivalent of "hello world" code in the data parallel world is matrix multiplication :)

Below is the before C++ AMP and after C++ AMP code. For more on what it all means, watch the recording of my C++ AMP introduction (the example below is part of the session).

    void MatrixMultiply(vector<float>& vC, 
			    const vector<float>& vA,
			    const vector<float>& vB, 
			    int M, int N, int W )
    {
        for (int y = 0; y < M; y++) 
        {
            for (int x = 0; x < N; x++) 
            {
                float sum = 0;
                for(int i = 0; i < W; i++)
                {
                    sum += vA[y * W + i] * vB[i * N + x];
                }
                vC[y * N + x] = sum;
	    }
        }
    }

Change the function to use C++ AMP and hence offload the computation to the GPU, and now the calling code (which I am not showing) needs no changes and the overall operation gives you really nice speed up for large datasets…

    #include <amp.h>
    using namespace concurrency;

    void MatrixMultiply(vector<float>& vC, 
			    const vector<float>& vA,
			    const vector<float>& vB, 
			    int M, int N, int W )
    {
        array_view<const float,2>      a(M, W, vA);
        array_view<const float,2>      b(W, N, vB);
        array_view<writeonly<float>,2> c(M, N, vC); 

        parallel_for_each(
            c.grid,
            [=](index<2> idx) mutable restrict(direct3d) 
            {
                float sum = 0;
                for(int i = 0; i < a.x; i++) 
                {
                    sum += a(idx.y, i) * b(i, idx.x);
                }
                c[idx] = sum;
            }
        );
    }

Again, you can understand the elements above, by using my C++ AMP presentation slides and recording…

Stay tuned for more…

Comments [2] |

Permalink

Wednesday, 29 June 2011 01:23:08 (Pacific Daylight Time, UTC-07:00)

Daniel, could you take some time to comment on the difficulty of parallelizing Strassen's algorithm http://en.wikipedia.org/wiki/Strassen_algorithm for matrix multiplication.

While 3 nested for-loops multiplication is the easiest, it is not the most efficient.

Tanveer Badar

Tuesday, 05 July 2011 19:58:57 (Pacific Daylight Time, UTC-07:00)

Tanveer, if you have really large matrices, then strassen would be a good option (you'd partition the data on the CPU and make multiple gpu kernel invocations). That is obviously not a Hello World example (the title of this blog post). When we ship bits, I'll be sure to include an example like that... thanks for the idea.

Daniel Moth

Comments are closed.

"Hello World" in C++ AMP

About

Tags

Latest Posts

Archives