Week 5 -> Optimizing GPU Programs (continued) Goal: Maximize useful computation/second After walking through APOD [Analyze, parallelize, optimize, deploy] the lecture turned to memory bandwidth. Using the CUDA utility deviceQuery to calculate memory bandwidth using memory clock rate and memory bus width. After determining the maximum theoretical bandwidth at 40 Gb/s practical goals were set: 40-60% -> OK 60-75% -> good >75% -> excellent The algorithm being analyzed is a transpose. When DRAM utilization is.. Read More