Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i.e., data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framework with symbolic options. Our framework features a loop training procedure, which enables guiding the improvement of policy by planning with planning models (including action models and hierarchical task network models) and symbolic options learned from interactive trajectories automatically. The learned symbolic options alleviate the dense requirement of expert domain knowledge and provide inherent interpretability of policies. Moreover, the transferability and data efficiency can be further improved by planning with the symbolic planning models. To validate the effectiveness of our framework, we conduct experiments on two domains, Montezuma's Revenge and Office World, respectively. The results demonstrate the comparable performance, improved data efficiency, interpretability and transferability.
翻译:尽管深度强化学习(Deep Reinforcement Learning,DRL)在实际应用中取得了巨大成功,但其仍面临三个关键问题:数据效率低下、缺乏可解释性以及迁移性不足。最新研究表明,将符号知识嵌入深度强化学习是应对这些挑战的有效途径。受此启发,我们提出了一种融合符号化选项(symbolic options)的新型深度强化学习框架。该框架采用循环训练流程,通过利用交互轨迹自动学习的规划模型(包括动作模型与分层任务网络模型)及符号化选项进行规划,从而引导策略的改进。学习到的符号化选项降低了对专家领域知识的大量需求,并赋予策略内在的可解释性。此外,通过结合符号化规划模型进行规划,可进一步提升迁移性与数据效率。为验证本框架的有效性,我们在Montezuma's Revenge和Office World两个领域上分别进行了实验。结果表明,该框架在保持相当性能的同时,显著提升了数据效率、可解释性与迁移性。