There is a surge of interest in using formal languages such as Linear Temporal Logic (LTL) and finite automata to precisely and succinctly specify complex tasks and derive reward functions for reinforcement learning (RL) in robotic applications. However, existing methods often assign sparse rewards (e.g., giving a reward of 1 only if a task is completed and 0 otherwise), necessitating extensive exploration to converge to a high-quality policy. To address this limitation, we propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by LTL formulas and develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process. Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms and consistently outperforms baselines, achieving earlier convergence to better policies with higher task success rates and returns.
翻译:近年来,利用线性时序逻辑(LTL)和有限自动机等形式化语言来精确、简洁地指定复杂任务,并为机器人应用中的强化学习(RL)推导奖励函数,引起了广泛的研究兴趣。然而,现有方法通常分配稀疏奖励(例如,仅在任务完成时给予奖励1,否则为0),这需要大量的探索才能收敛到高质量的策略。为了解决这一局限性,我们提出了一套奖励函数,用于激励RL智能体在由LTL公式指定的任务上取得可衡量的进展,并开发了一种自适应奖励塑形方法,在学习过程中动态更新这些奖励函数。在一系列基于RL的机器人任务上的实验结果表明,所提出的方法与多种RL算法兼容,并且始终优于基线方法,能够更早地收敛到更好的策略,同时获得更高的任务成功率和回报。