A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization


KIPS Transactions on Computer and Communication Systems, Vol. 6, No. 5, pp. 219-230, May. 2017
10.3745/KTCCS.2017.6.5.219,   PDF Download:
Keywords: Parallelism, Performance, Warp Scheduling, Resource utilization
Abstract

General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. By using programming models like CUDA, and OpenCL, GPGPUs are becoming the best in exploiting plentiful thread-level parallelism caused by parallel applications. Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. This paper studies the effects of hardware thread scheduling policy on GPGPU performance. We propose a novel warp scheduling policy that can alleviate the drawbacks of the traditional round-robin policy. The proposed warp scheduler first classifies the warps of a thread block into two groups, warps with long latency and warps with short latency and then schedules the warps with long latency before the warps with short latency. Furthermore, to support the proposed warp scheduler, we also propose a supplemental technique that can dynamically reduce the number of streaming multiprocessors to which will be assigned thread blocks when encountering a high contention degree at the memory and interconnection network. Based on our experiments on a 15-streaming multiprocessor GPGPU platform, the proposed warp scheduling policy provides an average IPC improvement of 7.5% over the baseline round-robin warp scheduling policy. This paper also shows that the GPGPU performance can be improved by approximately 8.9% on average when the two proposed techniques are combined.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
D. C. Thuan, Y. Choi, J. M. Kim, C. H. Kim, "A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization," KIPS Transactions on Computer and Communication Systems, vol. 6, no. 5, pp. 219-230, 2017. DOI: 10.3745/KTCCS.2017.6.5.219.

[ACM Style]
Do Cong Thuan, Yong Choi, Jong Myon Kim, and Cheol Hong Kim. 2017. A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization. KIPS Transactions on Computer and Communication Systems, 6, 5, (2017), 219-230. DOI: 10.3745/KTCCS.2017.6.5.219.