Running C++ AMP kernels on the CPU

Mon, September 19, 2011, 07:32 PM under GPGPU | ParallelComputing

One of the FAQs we receive is whether C++ AMP can be used to target the CPU.

For targeting multi-core we have a technology we released with VS2010 called PPL, which has had enhancements for VS 11 – that is what you should be using! FYI, it also has a Linux implementation via Intel's TBB which conforms to the same interface.

When you choose to use C++ AMP, you choose to take advantage of massively parallel hardware, through accelerators like the GPU.

Having said that, you can always use the accelerator class to check if you are running on a system where the is no hardware with a DirectX 11 driver, and decide what alternative code path you wish to follow. 

In fact, if you do nothing in code, if the runtime does not find DX11 hardware to run your code on, it will choose the WARP accelerator which will run your code on the CPU, taking advantage of multi-core and SSE2 (depending on the CPU capabilities WARP also uses SSE3 and SSE 4.1 – it does not currently use AVX and on such systems you hopefully have a DX 11 GPU anyway).

A few things to know about WARP

  • It is our fallback CPU solution, not intended as a primary target of C++ AMP.
  • WARP stands for Windows Advanced Rasterization Platform and you can read old info on this MSDN page on WARP.
  • What is new in Windows 8 Developer Preview is that WARP now supports DirectCompute, which is what C++ AMP builds on.
  • It is not currently clear if we will have a CPU fallback solution for non-Windows 8 platforms when we ship.
  • When you create a WARP accelerator, its is_emulated property returns true.
  • WARP does not currently support double precision.


BTW, when we refer to WARP, we refer to this accelerator described above. If we use lower case "warp", that refers to a bunch of threads that run concurrently in lock step and share the same instruction. In the VS 11 Developer Preview, the size of warp in our Ref emulator is 4 – Ref is another emulator that runs on the CPU, but it is extremely slow not intended for production, just for debugging.

Sunday, September 25, 2011 2:52:42 PM (Pacific Daylight Time, UTC-07:00)
"•It is not currently clear if we will have a CPU fallback solution for non-Windows 8 platforms when we ship."

Why not?

Is it true, then, that you have pretty much screwed all of us that have processors with vector ISA capabilities above SSE2? Is that the takeaway?

SSE2 has been here for 10 years and the only reason why you decided to use it was, why, x86-64? You guys are hilarious.

Will it take another 10 years to get you to use modern ISAs on modern CPUs automatically and to give us the same capabilities as available in other tools (commercial and open source)?

Sunday, September 25, 2011 4:09:02 PM (Pacific Daylight Time, UTC-07:00)
Hi MikeW, after reading your comments, my most respectable interpretation is that you incorrectly think the post is about C++ in general and not about C++ AMP specifically. C++ AMP in v1 allows code to take advantage of massive parallelism by primarily targeting DX 11 hardware. For code that targets primarily CPU parallelism, PPL is the way to go. For more, do read the actual post and the links it points to.

As per my post, it is not currently known if there will be a C++ AMP automatic fallback CPU solution for non-Win8 platforms (not sure you quite internalized that the comment applies to C++ AMP). As per the post, the fallback CPU solution relies on WARP (which targets up to SSE 4.1). In the Developer Preview WARP is supported on Win8 only and that component is not owned by my team, hence I cannot comment on its availability on non-Win8 platforms at RTM. An alternative is for my team to offer an additional out-of-band CPU fallback option that also works on non-Win8 platforms (hint: it would be based on PPL), which we are considering but its priority is much lower than other items we are working on, so again, it is not known at this point if there will be a non-Win8 fallback CPU solution for C++ AMP.

Thanks for posting here, and for additional questions/comments/feedback please use the appropriate MSDN forum:
Monday, January 2, 2012 3:48:08 AM (Pacific Standard Time, UTC-08:00)
What is the point of having a CPU fallback if it has an even stricter condition before it will work.
Isn't the point of a fallback that it will run anyway.
Doesn't OpenCL fallback to the CPU anyway?
Monday, January 2, 2012 4:44:55 AM (Pacific Standard Time, UTC-08:00)
Hi Kedas, can you describe your exact specific scenario so I can explain how C++ AMP satisfies (or not) that scenario? Also what is the specific stricter condition that you refer to? I think I may know the answer to both questions, but I prefer not to guess.

In terms of a generic abstracted answer: the point of any feature X is to cover a set of scenarios Y, and if that set of scenarios Y do not include scenario Z, the person interested in scenario Z ends up questioning the point of the feature X – such is life as I am sure you know being a developer yourself.
Comments are closed.