Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by nodes visited in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to longer exploratory walks and larger environments than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that next-node recommendations considering curiosity are more predictive of human choices than PageRank centrality in several real-world graph environments.
翻译:内驱探索已被证明对强化学习有效,即使没有额外外在奖励。当环境天然以图结构表示时,如何最优地引导探索仍是一个开放性问题。本文受两种人类好奇心理论——信息差距理论和压缩进步理论启发,提出了一种新颖的图结构数据探索方法。这两种理论将好奇心视为一种内在动机,用于优化环境中已访问节点所诱导的子图的拓扑特征。我们将这些特征作为基于图神经网络强化学习的奖励信号。在多类合成生成图上,训练后的智能体能够泛化到比训练过程中更长的探索路径和更大的环境。本方法比贪婪计算相关拓扑性质效率更高。所提出的内在动机对推荐系统尤为重要。实验表明,在多个真实图环境中,考虑好奇心的下一节点推荐比PageRank中心性更能预测人类选择。