Recent years, multi-hop reasoning has been widely studied for knowledge graph (KG) reasoning due to its efficacy and interpretability. However, previous multi-hop reasoning approaches are subject to two primary shortcomings. First, agents struggle to learn effective and robust policies at the early phase due to sparse rewards. Second, these approaches often falter on specific datasets like sparse knowledge graphs, where agents are required to traverse lengthy reasoning paths. To address these problems, we propose a multi-hop reasoning model with dual agents based on hierarchical reinforcement learning (HRL), which is named FULORA. FULORA tackles the above reasoning challenges by eFficient GUidance-ExpLORAtion between dual agents. The high-level agent walks on the simplified knowledge graph to provide stage-wise hints for the low-level agent walking on the original knowledge graph. In this framework, the low-level agent optimizes a value function that balances two objectives: (1) maximizing return, and (2) integrating efficient guidance from the high-level agent. Experiments conducted on three real-word knowledge graph datasets demonstrate that FULORA outperforms RL-based baselines, especially in the case of long-distance reasoning.
翻译:近年来,多跳推理因其高效性和可解释性在知识图谱推理领域得到广泛研究。然而,先前多跳推理方法存在两个主要缺陷。首先,由于奖励稀疏,智能体在早期阶段难以学习有效且稳健的策略。其次,这些方法在稀疏知识图谱等特定数据集上表现欠佳,此类场景要求智能体遍历冗长的推理路径。为解决这些问题,我们提出一种基于分层强化学习的双智能体多跳推理模型FULORA。该模型通过双智能体间的高效引导-探索机制应对上述推理挑战。高层智能体在简化知识图谱上行走,为在原始知识图谱上行走的低层智能体提供阶段式提示。在此框架中,低层智能体优化一个平衡双重目标的价值函数:(1) 最大化回报,(2) 整合来自高层智能体的高效引导。在三个真实世界知识图谱数据集上的实验表明,FULORA优于基于强化学习的基线方法,尤其在长距离推理场景中表现突出。