Best of "The Moth" 2011

Sun, January 1, 2012, 08:13 AM under Personal

Once again (like in 2004, 2005, 2006, 2007, 2008, 2009, 2010) the time has come to wish you a Happy New Year and to share my favorite posts from the year we just left behind.

1. My first blog entry in January and last one in December were both about my Windows Phone app: Translator by Moth and Translator by Moth v2. In between, I shared a few code snippets for Windows Phone development including a watermark textbox, a scroll helper, an RTL helper and a network connectivity helper - there will be more coming in 2012.

2. Efficiently using Microsoft Office products is the hallmark of an efficient Program Manager (and not only), and I'll continue sharing tips on this blog in that area. An example from last year is tracking changes in SharePoint-hosted Word document.

3. Half-way through last year I moved from managing the parallel debugger team to managing the C++ AMP team (both of them in Visual Studio 11). That means I had to deprioritize sharing content on VS parallel debugging features (I promise to do that in 2012), and it also meant that I wrote a lot about C++ AMP. You'll need a few cups of coffee to go through all of it, and most of the links were aggregated on this single highly recommended post: Give a session on C++ AMP – here is how

You can stay tuned for more by subscribing via one of the options on the left…


Translator by Moth v2

Fri, December 16, 2011, 10:10 PM under MobileAndEmbedded

If you are looking for the full manual for this Windows Phone app you can find it here: "Translator by Moth".

While the manual has no images (just text), in this post I will share images and if you like them, go get "Translator by Moth" from the Windows Phone marketplace.

open the app from the app list or through a pinned tile (including secondary tiles for specific translations)

Translator by Moth appList Translator by Moth tile 

language picker (~40 languages)

Translator by Moth from Translator by Moth to Translator by Moth to Translator by Moth to

"current" page

Translator by Moth current Translator by Moth sip Translator by Moth it Translator by Moth appBarTranslator by Moth CopyPaste

"saved" page

Translator by Moth savedEmpty Translator by Moth saved Translator by Moth saved_CM

"about" page

Translator by Moth about

Like? Go get Translator by Moth!


.NET access to the GPU for compute purposes

Thu, December 1, 2011, 07:42 PM under GPGPU | ParallelComputing

In the distant past I talked about GPGPU and Microsoft's then approach of DirectCompute. Since then of course we now have C++ AMP coming out with Visual Studio 11, so there is a mainstream easier way for developers to access the GPU for compute purposes, using C++.

The question occasionally arises of how can a .NET developer access the GPU for compute purposes from their C# (or VB) code. The answer is by interoping from the managed code to a native DLL and in the native DLL use C++ AMP.

As a long term .NET developer myself, I can tell you this is straightforward. Sure, there could have been a managed wrapper for C++ AMP, but honestly that is the reason we have interop – it doesn't make much sense to invest resources to solve a problem that is already solved (most developer customers would prefer investments in other areas of Visual Studio!). Besides, interoping from C# to C++ is much easier than interoping to some of the other older approaches of GPGPU programming ;-)

To help you get started with the interop approach, Igor Ostrovsky has previously shared the "Hello World" version of interoping from C# to C++ AMP in his blog post:

…we then were asked specifically about how to interop from C# to C++ AMP in a Metro style application on Windows 8, so Igor delivered again with this post:

Have fun!


Windows 8 Task Manager

Fri, November 4, 2011, 08:47 PM under Windows

If you are a user of Task Manager (btw, make sure you've read my Task Manager shortcut tips), you must read the blog post on the overhaul coming to Task Manager in Windows 8 – coo stuff!

Also, long time readers of my blog will know that back in 2008 I wrote about Windows Vista and Windows 7 number_of_cores support, and in 2009 I shared a widely borrowed screenshot of Task Manager from one of our 128-core machines. So I was excited to just read on the Windows 8 blog that Windows 8 will support up to 640 cores. They shared a screenshot of a 160-core machine, so there goes my record ;-)


Short interview on C++ AMP

Mon, October 17, 2011, 05:04 PM under GPGPU | ParallelComputing

While at the BUILD conference a month ago, I run into Bruno Boucard who asked me a few questions about C++ AMP. I just returned from vacation to find that he uploaded the 15-minute interview, so here is a direct link to youtube

play

http://www.youtube.com/watch?v=a2IbOe_ogGE

What DX level does my graphics card support? Does it go to 11?

Fri, October 14, 2011, 05:13 PM under GPGPU | ParallelComputing

Recently I run into a situation that I have run into quite a few times. Someone encounters a machine and the question arises: "Is there a DirectX 11 card in this machine?". Typically the reason you are interested in that is because cards with DirectX 11 drivers fully support DirectCompute (and by extension C++ AMP) for GPGPU programming. The driver specifically is WDDM (1.1 on Windows 7 and Windows 8 introduces WDDM 1.2 with cool new capabilities).

There are many ways for figuring out if you have a DirectX11 card, so here are the approaches that you can use, with a bonus right at the end of the post.

Run DxDiag

WindowsKey + R, type DxDiag and hit Enter. That is the DirectX diagnostic tool, which unfortunately, only tells you on the "System" tab what is the highest version of DirectX installed on your machine. So if it reports DirectX 11, that doesn't mean you have a DX11 driver! The "Display" tab has a promising "DDI version" label, but unfortunately that doesn't seem to be accurate on the machines I've tested it with (or I may be misinterpreting its use). Either way, this tool is not the one you want for this purpose, although it is good for telling you the WDDM version among other things.

Use the Microsoft hardware page

There is a Microsoft Windows 7 compatibility center, that lists all hardware (tip: use the advanced search) and you could try and locate your device there… good luck.

Use Wikipedia or the hardware vendor's website

Use the Wikipedia page for the vendor cards, for both nvidia and amd. Often this information will also be in the specifications for the cards on the IHV site, but is is nice that wikipedia has a single page per vendor that you can search etc. There is a column in the tables for API support where you can see the DirectX version.

Check if it is one of these recommended DX11 cards

You may not have a DirectX 11 card and are interested in purchasing one. While I am in no position to make recommendations, I will list here some cards from two big IHVs that we know are DirectX 11 capable.

  • Some AMD (aka ATI) cards
    • Low end, inexpensive DX11 hardware:
      • Radeon 5450, 5550, 6450, 6570
    • Mid range (decent perf, single precision):
      • Radeon 5750, 5770, 6770, 6790
    • High end (capable of double precision):
      • Radeon 5850, 5870, 6950, 6970
    • Single precision APUs:
      • AMD E-Series APUs
      • AMD A-Series APUs
  • Some NVIDIA cards
    • Low end, inexpensive DX11 hardware:
      • GeForce GT430, GT 440, GT520, GTS 450
      • Quadro 400, 600
    • Mid-range (decent perf, single precision):
      • GeForce GTX 460, GTX 550 Ti, GTX 560, GTX 560 Ti
      • Quadro 2000
    • High end (capable of double precision):
      • GeForce GTX 480, GTX 570, GTX 580, GTX 590, GTX 595
      • Quadro 4000, 5000, 6000
      • Tesla C2050, C2070, C2075

Get the DirectX SDK and run DirectX Caps Viewer

Download and install the June 2010 DirectX SDK. As part of that you now have the DirectX Capabilities Viewer utility (find it in your start menu by searching for "DirectX Caps Viewer", the filename is DXCapsViewer.exe). It will list all your devices (emulated, and real hardware ones) under the first node. Expand the hardware entries and then expand again the Direct3D 11 folder. If you see D3D_FEATURE_LEVEL_11_ under that, then your card supports feature level 11 which means it supports DirectCompute and C++ AMP. In the following screenshot of one of my old laptops, the card only goes to feature level 10.

DirectX Caps Viewer

Run a utility from the web that just tells you!

Of course, writing some C++ AMP code that enumerates accelerators and lists the ones that are capable is trivial. However that requires that you have redistributed the runtime, so a more broadly applicable approach is to use the DX APIs directly to enumerate the DX11 capable cards. That is exactly what the development lead for C++ AMP has done and he describes and shares that utility at this post.


Give a session on C++ AMP – here is how

Wed, September 21, 2011, 06:53 PM under GPGPU | ParallelComputing

Ever since presenting on C++ AMP at the AMD Fusion conference in June, then the Gamefest conference in August, and the BUILD conference in September, I've had numerous requests about my material from folks that want to re-deliver the same session. The C++ AMP session I put together has evolved over the 3 presentations to its final form that I used at BUILD, so that is the one I recommend you base yours on.BUILD session

Please get the slides and the recording from channel9 (I'll refer to slide numbers below).

This is how I've been presenting the C++ AMP session:

Context

  1. (slide 3, 04:18-08:18) Start with a demo, on my dual-GPU machine. I've been using the N-Body sample (for VS 11 Developer Preview).
  2. (slide 4) Use an nvidia slide that has additional examples of performance improvements that customers enjoy with heterogeneous computing.
  3. (slide 5) Talk a bit about the differences today between CPU and GPU hardware, leading to the fact that these will continue to co-exist and that GPUs are great for data parallel algorithms, but not much else today. One is a jack of all trades and the other is a number cruncher.
  4. (slide 6) Use the APU example from amd, as one indication that the hardware space is still in motion, emphasizing that the C++ AMP solution is a data parallel API, not a GPU API. It has a future proof design for hardware we have yet to see.
  5. (slide 7) Provide more meta-data, as blogged about when I first introduced C++ AMP.

Code

  1. (slide 9-11) Introduce C++ AMP coding with a simplistic array-addition algorithm – the slides speak for themselves.
  2. (slide 12-13) index<N>, extent<N>, and grid<N>.
  3. (Slide 14-16) array<T,N>, array_view<T,N> and comparison between them.
  4. (Slide 17) parallel_for_each.
  5. (slide 18, 21) restrict.
  6. (slide 19-20) actual restrictions of restrict(direct3d) – the slides speak for themselves.
  7. (slide 22) bring it altogether with a matrix multiplication example.
  8. (slide 23-24) accelerator, and accelerator_view.
  9. (slide 26-29) Introduce tiling incl. tiled matrix multiplication [tiling probably deserves a whole session instead of 6 minutes!].

IDE

  1. (slide 34,37) Briefly touch on the concurrency visualizer. It supports GPU profiling, but enhancements specific to C++ AMP we hope will come at the Beta timeframe, which is when I'll be spending more time talking about it.
  2. (slide 35-36, 51:54-59:16) Demonstrate the GPU debugging experience in VS 11.

Summary

  1. (slide 39) Re-iterate some of the points of slide 7, and add the point that the C++ AMP spec will be open for other compiler vendors to implement, even on other platforms (in fact, Microsoft is actively working on that).
  2. (slide 40) Links to content – see slide – including where all your questions should go: http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads.

 

"But I don't have time for a full blown session, I only need 2 (or just 1, or 3) C++ AMP slides to use in my session on related topic X"

If all you want is a small number of slides, you can take some from the session above and customize them. But because I am so nice, I have created some slides for you, including talking points in the notes section. Download them here.


GPU Debugging with VS 11

Tue, September 20, 2011, 07:21 PM under GPGPU | ParallelComputing | VisualStudio

With VS 11 Developer Preview we have invested tremendously in parallel debugging for both CPU (managed and native) and GPU debugging. I'll be doing a whole bunch of blog posts on those topics, and in this post I just wanted to get people started with GPU debugging, i.e. with debugging C++ AMP code.

First I invite you to watch 6 minutes of a glimpse of the C++ AMP debugging experience though this video (ffw to minute 51:54, up until minute 59:16). Don't read the rest of this post, just go watch that video, ideally download the High Quality WMV.

Summary

GPU debugging essentially means debugging the lambda that you pass to the parallel_for_each call (plus any functions you call from the lambda, of course). CPU debugging means debugging all the code above and below the parallel_for_each call, i.e. all the code except the restrict(direct3d) lambda and the functions that it calls. With VS 11 you have to choose what debugger you want to use for a particular debugging session, CPU or GPU. So you can place breakpoints all over your code, then choose what debugger you want (CPU or GPU), and you'll only be able to hit breakpoints for the code type that the debugger engine understands – the remaining breakpoints will appear as unbound. If you want to hit the unbound breakpoints, you'd have to stop debugging, and start again with the other debugger. Sorry. We suck. We know. But once you are past that limitation, I think you'll find the experience truly rewarding – seriously!

Switching debugger engines

With the Developer Preview bits, one way to switch the debugger engine is through the project properties – see the screenshots that follow.

This one is showing the CPU option selected, which is basically the default that you are all familiar with:

image

This screenshot is showing the GPU option selected, by changing the debugger launcher (notice that this applies for both the local and remote case):

image

You actually do not have to open the project properties just for switching the debugger engine, you can switch the selection from the toolbar in VS 11 Developer Preview too – see following screenshot (the effect is the same as if you opened the project properties and switched there)

image

Breakpoint behavior

Here are two screenshots, one showing a debugging session for CPU and the other a debugging session for GPU (notice the unbound breakpoints in each case)

image

…and here is the GPU case (where we cannot bind the CPU breakpoints but can the GPU breakpoint, which is actually hit)

image

Give C++ AMP debugging a try

So to debug your C++ AMP code, pull down the drop down under the 'play' button to select the 'GPU C++ Direct3D Compute Debugger' menu option, then hit F5 (or the 'play' button itself). Then you can explore debugging by exploring the menus under the Debug and under the Debug->Windows menus. One way to do that exploration is through the C++ AMP debugging walkthrough on MSDN.

Another way to explore the C++ AMP debugging experience, you can use the moth.cpp code file, which is what I used in my BUILD session debugger demo. Note that for my demo I was using the latest internal VS11 bits, so your experience with the Developer Preview bits won't be identical to what you saw me demonstrate, but it shouldn't be far off.

Stay tuned for a lot more content on the parallel debugger in VS 11, both CPU and GPU, both managed and native.


Running C++ AMP kernels on the CPU

Mon, September 19, 2011, 07:32 PM under GPGPU | ParallelComputing

One of the FAQs we receive is whether C++ AMP can be used to target the CPU.

For targeting multi-core we have a technology we released with VS2010 called PPL, which has had enhancements for VS 11 – that is what you should be using! FYI, it also has a Linux implementation via Intel's TBB which conforms to the same interface.

When you choose to use C++ AMP, you choose to take advantage of massively parallel hardware, through accelerators like the GPU.

Having said that, you can always use the accelerator class to check if you are running on a system where the is no hardware with a DirectX 11 driver, and decide what alternative code path you wish to follow. 

In fact, if you do nothing in code, if the runtime does not find DX11 hardware to run your code on, it will choose the WARP accelerator which will run your code on the CPU, taking advantage of multi-core and SSE2 (depending on the CPU capabilities WARP also uses SSE3 and SSE 4.1 – it does not currently use AVX and on such systems you hopefully have a DX 11 GPU anyway).

A few things to know about WARP

  • It is our fallback CPU solution, not intended as a primary target of C++ AMP.
  • WARP stands for Windows Advanced Rasterization Platform and you can read old info on this MSDN page on WARP.
  • What is new in Windows 8 Developer Preview is that WARP now supports DirectCompute, which is what C++ AMP builds on.
  • It is not currently clear if we will have a CPU fallback solution for non-Windows 8 platforms when we ship.
  • When you create a WARP accelerator, its is_emulated property returns true.
  • WARP does not currently support double precision.

 

BTW, when we refer to WARP, we refer to this accelerator described above. If we use lower case "warp", that refers to a bunch of threads that run concurrently in lock step and share the same instruction. In the VS 11 Developer Preview, the size of warp in our Ref emulator is 4 – Ref is another emulator that runs on the CPU, but it is extremely slow not intended for production, just for debugging.


Links to C++ AMP and other content

Fri, September 16, 2011, 10:00 PM under GPGPU | Links | ParallelComputing | VisualStudio | Windows

A few links you may be interested in.