GEC Student Work

Exploring a Reinforcement Learning Agent with Improved Prioritized Experience Replay for a Confrontation Game

Abstract: Deep Q-network (DQN) is used successfully in dealing with many reinforcement learning situations and challenging tasks with real-world complexity. The current limits are the unacceptable training time to obtain satisfactory results like a human. In order to address this obstacle, this paper proposes a new reinforcement learning strategy. This paper focuses on the confrontation game environment for two players with sparse reward and no direct hindsight reward function, and no fixed goals. According to some strategies, the algorithm can put them into reinforcement learning with reward functions and replay to give the abilities to judge in the middle of the games as references. To demonstrate the effectiveness of the proposed strategy, a new game is designed. The fence game is a confrontation game for two players; one tries their best to fence the other one in Die Now. The custom environment of this game will give the only reward functions at the end: win, lose or draw. In conclusion, these factors include performance, and results proved that 1) Prioritized Experience Replay with Dynamic Hindsight reward function (DH-PER) and 2) Prioritized Experience Replay with Dynamic Hindsight reward function and Sharing (DHS-PER) both let the RL agents converge more quickly.