Recent years, multi-hop reasoning has been widely studied for knowledge graph (KG) reasoning due to its efficacy and interpretability. However, previous multi-hop reasoning approaches are subject to two primary shortcomings. First, agents struggle to learn effective and robust policies at the early phase due to sparse rewards. Second, these approaches often falter on specific datasets like sparse knowledge graphs, where agents are required to traverse lengthy reasoning paths. To address these problems, we propose a multi-hop reasoning model with dual agents based on hierarchical reinforcement learning (HRL), which is named FULORA. FULORA tackles the above reasoning challenges by eFficient GUidance-ExpLORAtion between dual agents. The high-level agent walks on the simplified knowledge graph to provide stage-wise hints for the low-level agent walking on the original knowledge graph. In this framework, the low-level agent optimizes a value function that balances two objectives: (1) maximizing return, and (2) integrating efficient guidance from the high-level agent. Experiments conducted on three real-word knowledge graph datasets demonstrate that FULORA outperforms RL-based baselines, especially in the case of long-distance reasoning.
翻译:近年来,多跳推理因其高效性和可解释性在知识图谱推理领域得到广泛研究。然而,现有的多跳推理方法存在两个主要缺陷。首先,由于奖励稀疏,智能体在早期阶段难以学习到有效且稳健的策略。其次,这些方法在处理特定数据集(如稀疏知识图谱)时往往失效,因为智能体需要遍历冗长的推理路径。为解决这些问题,我们提出了一种基于分层强化学习的双智能体多跳推理模型,命名为FULORA。FULORA通过双智能体间的高效引导-探索机制应对上述推理挑战。高层智能体在简化的知识图谱上行走,为在原始知识图谱上行走的低层智能体提供阶段式提示。在此框架中,低层智能体优化一个平衡两个目标的价值函数:(1)最大化回报;(2)整合来自高层智能体的高效引导。在三个真实世界知识图谱数据集上进行的实验表明,FULORA优于基于强化学习的基线方法,尤其在长距离推理场景中表现突出。