Real-Time GPU Task Monitoring and Node List Management Techniques for Container Deployment in a Cluster-Based Container Environment


KIPS Transactions on Computer and Communication Systems, Vol. 11, No. 11, pp. 381-394, Nov. 2022
https://doi.org/10.3745/KTCCS.2022.11.11.381,   PDF Download:
Keywords: HPC Cloud, Container, GPU Computing, Real-time Task, monitoring
Abstract

Recently, due to the personalization and customization of data, Internet-based services have increased requirements for real-time processing, such as real-time AI inference and data analysis, which must be handled immediately according to the user's situation or requirement. Real-time tasks have a set deadline from the start of each task to the return of the results, and the guarantee of the deadline is directly linked to the quality of the services. However, traditional container systems are limited in operating real-time tasks because they do not provide the ability to allocate and manage deadlines for tasks executed in containers. In addition, tasks such as AI inference and data analysis basically utilize graphical processing units (GPU), which typically have performance impacts on each other because performance isolation is not provided between containers. And the resource usage of the node alone cannot determine the deadline guarantee rate of each container or whether to deploy a new real-time container. In this paper, we propose a monitoring technique for tracking and managing the execution status of deadlines and real-time GPU tasks in containers to support real-time processing of GPU tasks running on containers, and a node list management technique for container placement on appropriate nodes to ensure deadlines. Furthermore, we demonstrate from experiments that the proposed technique has a very small impact on the system.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. Kang and J. Gil, "Real-Time GPU Task Monitoring and Node List Management Techniques for Container Deployment in a Cluster-Based Container Environment," KIPS Transactions on Computer and Communication Systems, vol. 11, no. 11, pp. 381-394, 2022. DOI: https://doi.org/10.3745/KTCCS.2022.11.11.381.

[ACM Style]
Jihun Kang and Joon-Min Gil. 2022. Real-Time GPU Task Monitoring and Node List Management Techniques for Container Deployment in a Cluster-Based Container Environment. KIPS Transactions on Computer and Communication Systems, 11, 11, (2022), 381-394. DOI: https://doi.org/10.3745/KTCCS.2022.11.11.381.