Unit 3 worked on analyse the performance of CUDA programs along with some key algorithms: Reduce, Scan and Histogram. Parallelism is easiest with one-to-one or many-to-one communication patterns week 3 looked into how we still get great performance gains from parallelism when we need to do all-to-all and many-to-many communication patterns between threads. Just because ideal scaling may not be possible for an algorithm parallelism is likely still going to result in more efficient computing.

Step and work complexity are the first steps to establishing the truthfulness of this claim.

A parallel algorithm will be considered 'work efficient' if its work complexity is asymptotically to same as its serial counterpart
A parallel algorithm will be considered ‘work efficient’ if its work complexity is asymptotically to same as its serial counterpart

Comparing the step and work complexity of parallel algorithms is a good start to understanding their efficiency.

If we can reduce the step complexity in out parallel implementation compared to the serial implementation whilst maintaining work efficency we will have a faster execution.

REDUCE

Reduce has 2 inputs:

  1. Set of elements
  2. Reduction operation (ie: +)

Reduction operators must be binary, operate on two inputs and generate one output.

Reduction operators must be associative, a op b op c == c op b op a

eg: 1 + 2 + 3 == 3 + 2 + 1, whereas 1 – 2 – 3 != 3 – 1 – 2

So, some operators that are both binary and associative are: multiplication, minimum, logical OR, bitwise AND

After running through some examples of reduce including an implementation using shared memory in CUDA we  looked at Scan.

Inclusive vs Exclusive, Hillis Steele vs Blelloch where covered. Finally histograms was covered along with a quick touch on sort.

For the assignment I used a diagram from http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html to implement the solution.

Understanding this process is much easier with a diagram!
Understanding this process is much easier with a diagram!

Week 3 Lectures

Week 3 Assignment