Identifying the most frequent induced subgraph of size $k$ in a target graph is a fundamental graph mining problem with direct implications for Web-related data mining and social network analysis. Despite its importance, finding the most frequent induced subgraph remains computationally expensive due to the NP-hard nature of the subgraph counting task. Traditional exact enumeration algorithms often suffer from high time complexity, especially for a large graph size $k$. To mitigate this, existing approaches often utilize frequency measurement with the Downward Closure Property to reduce the search space, imposing additional constraints on the task. In this paper, we first formulate this task as a Markov Decision Process and approach it using a multi-task reinforcement learning framework. Specifically, we introduce RLMiner, a novel framework that integrates reinforcement learning with our proposed task-state-aware Graph Neural Network to find the most frequent induced subgraph of size $k$ with a time complexity linear to $k$. Extensive experiments on real-world datasets demonstrate that our proposed RLMiner effectively identifies subgraphs with frequencies closely matching the ground-truth most frequent induced subgraphs, while achieving significantly shorter and more stable running times compared to traditional methods.
翻译:识别目标图中最频繁的k阶诱导子图是一个基础的图挖掘问题,对网络相关数据挖掘和社交网络分析具有直接意义。尽管该问题十分重要,但由于子图计数任务本身具有NP难特性,寻找最频繁诱导子图的计算代价仍然很高。传统的精确枚举算法通常具有较高的时间复杂度,尤其在子图规模k较大时更为显著。为缓解此问题,现有方法常利用具有向下封闭性的频率度量来缩减搜索空间,但这会对任务施加额外的约束。本文首先将该任务形式化为马尔可夫决策过程,并采用多任务强化学习框架进行处理。具体而言,我们提出了RLMiner这一创新框架,它将强化学习与我们提出的任务状态感知图神经网络相结合,以时间复杂度与k呈线性关系的方式寻找最频繁的k阶诱导子图。在真实数据集上的大量实验表明,我们提出的RLMiner能够有效识别出频率与真实最频繁诱导子图高度接近的子图,同时相比传统方法获得了显著更短且更稳定的运行时间。