Categories
Introduction to Parallel Programming Online Courses

Introduction to Parallel Programming – Week 2

After learning the GPU programming model and writing a basic CUDA program Week 2 introduces some concepts for efficient algorithms.

Communication is the first issues; how do threads communicate efficiently. This is easier in some problems, ie: Map than others ie: Scan/Sort. The memory access and write patterns  of some of the key algorithm types are discussed. Input-to-Output relationships:

  • Map: one-to-one
  • Gather: many-to-one
  • Stencil: several-to-one
  • Reduce: all-to-one
  • Scatter: one-to-many
  • Scan: all-to-all
  • Sort: all-to-all

The basic problem of concurrent memory access was illustrated via a scatter example. With one input tying to write a 3rd of its value to the neighboring elements we can see that trouble will result from independent thread execution.

one-to-many memory writes will pose a problem with independent threads writing over the same memory locations
one-to-many memory writes will pose a problem with independent threads writing over the same memory locations

To overcome these barriers threads must communicate in some ways. Shared memory and synchronization points were described as tools for this job.

Some important information about CUDA and what it guarantees (and doesn’t guarantee) about thread execution was also touched on:

  • No guarantees about when and where thread blocks will run –  this is an enabler for massive parallelism. There are also some limitation due to this, no assumptions about what blocks will run on what SM and there can be no direct communication between blocks.
  • Does guarantee – all threads in a block are guaranteed to run on the same SM at the same time.
  • Does guarantee – all blocks in a kernel finish before blocks from the next kernel run

A good source for some of the hardware diagrams and acronyms: www.cs.nyu.edu/manycores/cuda_many_cores.pdf

Finally coalescing memory access was discussed as a very good practice for efficiency.

cuda_coalesced_memory_access
The good. the not so good and the bad types of memory access

Week 2 lectures

Week 2 Assignment