A Performance Study on CPU-GPU Data Transfers of Unified Memory Device


KIPS Transactions on Computer and Communication Systems, Vol. 11, No. 5, pp. 133-138, May. 2022
https://doi.org/10.3745/KTCCS.2022.11.5.133,   PDF Download:
Keywords: HPC, GPU, Unified Memory, Data Transfer
Abstract

Recently, as GPU performance has improved in HPC and artificial intelligence, its use is becoming more common, but GPU programming is still a big obstacle in terms of productivity. In particular, due to the difficulty of managing host memory and GPU memory separately, research is being actively conducted in terms of convenience and performance, and various CPU-GPU memory transfer programming methods are suggested. Meanwhile, recently many SoC (System on a Chip) products such as Apple M1 and NVIDIA Tegra that bundle CPU, GPU, and integrated memory into one large silicon package are emerging. In this study, data between CPU and GPU devices are used in such an integrated memory device and performance-related research is conducted during transmission. It shows different characteristics from the existing environment in which the host memory and GPU memory in the CPU are separated. Here, we want to compare performance by CPU-GPU data transmission method in NVIDIA SoC chips, which are integrated memory devices, and NVIDIA SMX-based V100 GPU devices. For the experimental workload for performance comparison, a two-dimensional matrix transposition example frequently used in HPC applications was used. We analyzed the following performance factors: the difference in GPU kernel performance according to the CPU-GPU memory transfer method for each GPU device, the transfer performance difference between page-locked memory and pageable memory, overall performance comparison, and performance comparison by workload size. Through this experiment, it was confirmed that the NVIDIA Xavier can maximize the benefits of integrated memory in the SoC chip by supporting I/O cache consistency.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
O. Kwon and G. Gu, "A Performance Study on CPU-GPU Data Transfers of Unified Memory Device," KIPS Transactions on Computer and Communication Systems, vol. 11, no. 5, pp. 133-138, 2022. DOI: https://doi.org/10.3745/KTCCS.2022.11.5.133.

[ACM Style]
Oh-Kyoung Kwon and Gibeom Gu. 2022. A Performance Study on CPU-GPU Data Transfers of Unified Memory Device. KIPS Transactions on Computer and Communication Systems, 11, 5, (2022), 133-138. DOI: https://doi.org/10.3745/KTCCS.2022.11.5.133.