Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.
翻译:强化学习(RL)是用于训练智能体执行任务的强大方法,但设计合适的奖励机制对其成功至关重要。然而,在许多情况下,学习目标的复杂性超越了马尔可夫假设的能力范围,需要更为精巧的奖励机制。奖励机和欧米伽正则语言是两种分别用于表达定量和定性非马尔可夫奖励的形式化方法。本文引入欧米伽正则奖励机,该机制将奖励机与欧米伽正则语言相结合,为RL提供了一种表达力强且有效的奖励机制。我们提出了一种无模型RL算法,用于计算针对欧米伽正则奖励机的ε-最优策略,并通过实验评估了所提算法的有效性。