Reinforcement learning (RL) has shown its strength in challenging sequential decision-making problems. The reward function in RL is crucial to the learning performance, as it serves as a measure of the task completion degree. In real-world problems, the rewards are predominantly human-designed, which requires laborious tuning, and is easily affected by human cognitive biases. To achieve automatic auxiliary reward generation, we propose a novel representation learning approach that can measure the ``transition distance'' between states. Building upon these representations, we introduce an auxiliary reward generation technique for both single-task and skill-chaining scenarios without the need for human knowledge. The proposed approach is evaluated in a wide range of manipulation tasks. The experiment results demonstrate the effectiveness of measuring the transition distance between states and the induced improvement by auxiliary rewards, which not only promotes better learning efficiency but also increases convergent stability.
翻译:强化学习在解决复杂的序列决策问题中展现了其优势。其中奖励函数是影响学习性能的关键要素,它作为任务完成程度的度量标准。在现实问题中,奖励大多由人工设计,这不仅需要繁琐的调参过程,还易受人类认知偏差影响。为实现自动辅助奖励生成,我们提出了一种新颖的表示学习方法,能够度量状态间的"转移距离"。基于这些表示,我们引入了一种适用于单任务和技能链场景的辅助奖励生成技术,无需依赖人类知识。该方法在一系列操作任务中进行了评估。实验结果表明,度量状态间转移距离的有效性及其通过辅助奖励带来的性能提升,不仅促进了更好的学习效率,还增强了收敛稳定性。