In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this article reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic, or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation.
翻译:在序列决策(SDM)领域,自动化规划(AP)与强化学习(RL)两大范式长期占据主导地位。本着融合互补的精神,本文系统回顾了用于求解序列决策过程(SDPs)的AP、RL及混合方法(例如新兴的“学习规划”技术),重点关注其知识表示形式:符号式、亚符号式或二者结合。此外,本文亦涵盖了学习SDP结构的相关方法。最后,我们通过对比现有方法的优势与局限,指出神经符号AI为SDM提供了极具前景的研究路径——因其通过混合知识表示实现了AP与RL的有机融合。