Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization | Journal of Artificial Intelligence Research

PDF

Published: Jun 5, 2023

DOI: https://doi.org/10.1613/jair.1.14398

Keywords:

reinforcement learning, Hindsight Experience Replay, Sparse Reward

Jiawei Xu

a:1:{s:5:"en_US";s:2:"no";}

Shuxing Li

Rui Yang

Chun Yuan

Lei Han

Abstract

Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challenging problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past experiences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of samples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.

Issue

Vol. 77 (2023)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details