The topic of week 4’s lectures and assignment was Sort and Scan. These are slightly more complex due to the many-to-many, all-to-all communication patterns.

Part 1 focused on Scan.

Important properties of Scan, work complexity: O(n), step complexity: O(log n).

Variations on the Scan algorithm:

- Compact/Filter – Gathering a subset that meet a certain criteria (input, predicate, output). Ie only keep input elements where the predicate is true. Outputs can be dense or sparse. Sparse is easy, as elements map to the same locations. Dense will result in contiguous array of the filtered elements (much faster for subsequent reading and writing).

In the card deck example where we want the output of only diamonds using compact will be much faster if the computecard() function is above a minimal computational cost. So compact is very useful when the number of elelments filtered out is large and the computation on each surviving element is high.

Summary of how to Compact:

- Predicate
- Scan-In Array: 1 / 0
- Exclusive sum scan (scatter addresses for the compacted array)
- Scatter input input element into output array using the scatter address addresses.

The predicate and scan parts of Compact will have the same run time regardless of the number of elements filtered. If very few items are to be written to the output then the scatter step will be faster.

A brief side not in the lecture introduced Allocate is a generalized version of Compact which can be used for clipping.

Next came Segmented Scan, we want to keep our kernels busy and independent. Segmented scan uses a second array to keep track of segment heads. This allows independent computation of scan values that can be reconciled when there is less work to do.

Sparse Matrix/Dense Vector multiplication [SpMv] was a practical application for the techniques we had just learned. SpMv is a good practical example as they are common in many problems domains, ie: (Page Ranking by search engines).

Traditional way to represent a spare matrix is a Compressed Sparse Row [CSR] – Value (non-zero data element values), Column (column of elements origin), RowPtr (element index that starts a new row)

Now that we have the CSR representation of the sparse matrix we can conduct the multiplication.

- Segment the Value array using the rowptr values as demarkation points
- Gather the vector values for each element in the array using the column array
- Use a Map function to multiply the value array with the gathered vector array.
- Conduct and exclusve segmented sum scan to get the final answer