Max-Mean N-step Temporal-Difference Learning Using Multi-Step Return


KIPS Transactions on Computer and Communication Systems, Vol. 10, No. 5, pp. 155-162, May. 2021
https://doi.org/10.3745/KTCCS.2021.10.5.155,   PDF Download:
Keywords: Reinforcement Learning, Q-learning, DQN, n-step Temporal-Difference Learning
Abstract

n-step TD learning is a combination of Monte Carlo method and one-step TD learning. If appropriate n is selected, n-step TD learning is known as an algorithm that performs better than Monte Carlo method and 1-step TD learning, but it is difficult to select the best values of n. In order to solve the difficulty of selecting the values of n in n-step TD learning, in this paper, using the characteristic that overestimation of Q can improve the performance of initial learning and that all n-step returns have similar values for Q≈Q* , we propose a new learning target, which is composed of the maximum and the mean of all k-step returns for 1≤k≤n. Finally, in OpenAI Gym's Atari game environment, we compare the proposed algorithm with n-step TD learning and proved that the proposed algorithm is superior to n-step TD learning algorithm.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
G. Hwang, J. Kim, J. Heo, Y. Han, "Max-Mean N-step Temporal-Difference Learning Using Multi-Step Return," KIPS Transactions on Computer and Communication Systems, vol. 10, no. 5, pp. 155-162, 2021. DOI: https://doi.org/10.3745/KTCCS.2021.10.5.155.

[ACM Style]
Gyu-Young Hwang, Ju-Bong Kim, Joo-Seong Heo, and Youn-Hee Han. 2021. Max-Mean N-step Temporal-Difference Learning Using Multi-Step Return. KIPS Transactions on Computer and Communication Systems, 10, 5, (2021), 155-162. DOI: https://doi.org/10.3745/KTCCS.2021.10.5.155.