Goal-Conditioned Reinforcement Learning (GCRL) is a framework for learning a policy that can reach arbitrarily given goals. In particular, Contrastive Reinforcement Learning (CRL) provides a framework for policy updates using an approximation of the value function estimated via contrastive learning, achieving higher sample efficiency compared to conventional methods. However, since CRL treats the visited state as a pseudo-goal during learning, it can accurately estimate the value function only for limited goals. To address this issue, we propose a novel data augmentation approach for CRL called ViSA (Visited-State Augmentation). ViSA consists of two components: 1) generating augmented state samples, with the aim of augmenting hard-to-visit state samples during on-policy exploration, and 2) learning consistent embedding space, which uses an augmented state as auxiliary information to regularize the embedding space by reformulating the objective function of the embedding space based on mutual information. We evaluate ViSA in simulation and real-world robotic tasks and show improved goal-space generalization, which permits accurate value estimation for hard-to-visit goals. Further details can be found on the project page: \href{https://issa-n.github.io/projectPage_ViSA/}{\texttt{https://issa-n.github.io/projectPage\_ViSA/}}
翻译:目标条件强化学习(GCRL)旨在学习能够到达任意给定目标的策略框架。特别地,对比强化学习(CRL)通过对比学习估计价值函数的近似值来指导策略更新,相比传统方法实现了更高的样本效率。然而,由于CRL在学习过程中将访问状态视为伪目标,其价值函数估计仅对有限目标保持准确。为解决此问题,我们提出了一种名为ViSA(访问状态增强)的新型CRL数据增强方法。ViSA包含两个核心组件:1)生成增强状态样本——旨在增强策略探索过程中难以访问的状态样本;2)学习一致嵌入空间——以增强状态作为辅助信息,通过基于互信息重构嵌入空间的目标函数来实现嵌入空间的正则化。我们在仿真与真实机器人任务中评估ViSA,结果表明该方法提升了目标空间的泛化能力,能够对难以访问的目标实现精确的价值估计。更多细节请访问项目页面:\href{https://issa-n.github.io/projectPage_ViSA/}{\texttt{https://issa-n.github.io/projectPage\_ViSA/}}