Unlike the standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, whose rewards are predicated on state history rather than solely on the current state. Solving a non-Markovian task, frequently applied in practical applications such as autonomous driving, financial trading, and medical diagnosis, can be quite challenging. We propose a novel RL approach to achieve non-Markovian rewards expressed in temporal logic LTL$_f$ (Linear Temporal Logic over Finite Traces). To this end, an encoding of linear complexity from LTL$_f$ into MDPs (Markov Decision Processes) is introduced to take advantage of advanced RL algorithms. Then, a prioritized experience replay technique based on the automata structure (semantics equivalent to LTL$_f$ specification) is utilized to improve the training process. We empirically evaluate several benchmark problems augmented with non-Markovian tasks to demonstrate the feasibility and effectiveness of our approach.
翻译:与标准强化学习模型不同,许多现实世界任务具有非马尔可夫性,其奖励基于状态历史而非仅当前状态。解决非马尔可夫任务(常见于自动驾驶、金融交易和医疗诊断等实际应用中)极具挑战性。我们提出一种新颖的强化学习方法,用于处理以时序逻辑LTL$_f$(有限迹线性时序逻辑)表达的非马尔可夫奖励。为此,我们引入一种从LTL$_f$到马尔可夫决策过程的线性复杂度编码,以利用先进的强化学习算法。随后,基于自动机结构(语义等价于LTL$_f$规范)的优先级经验回放技术被用于优化训练过程。通过多个增强非马尔可夫任务的基准问题实验,我们验证了该方法的可行性与有效性。