Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.
翻译:内在动机驱动的探索已被证明对强化学习有效,即使在没有额外外在奖励的情况下。当环境自然表示为图结构时,如何优化探索仍是一个开放性问题。本研究提出了一种基于人类好奇心两种理论(信息差距理论与压缩进展理论)的图结构数据探索新方法。该理论将好奇心视为一种内在动机,旨在优化由环境中已访问节点所诱导子图的拓扑特征。我们将这些提议的特征作为图神经网络强化学习的奖励信号。在多种合成生成的图数据集上,我们发现训练后的智能体能够泛化到比训练时更大的环境及更长的探索路径。相较于相关拓扑属性的贪心计算,我们的方法具有更高的计算效率。所提出的内在动机对推荐系统具有特殊相关性。我们证明,在包括MovieLens、Amazon Books和Wikispeedia在内的多个真实图数据集上,基于好奇心的推荐比PageRank中心性更能预测人类行为。