Identifying the most frequent induced subgraph of size $k$ in a target graph is a fundamental graph mining problem with direct implications for Web-related data mining and social network analysis. Despite its importance, finding the most frequent induced subgraph remains computationally expensive due to the NP-hard nature of the subgraph counting task. Traditional exact enumeration algorithms often suffer from high time complexity, especially for a large graph size $k$. To mitigate this, existing approaches often utilize frequency measurement with the Downward Closure Property to reduce the search space, imposing additional constraints on the task. In this paper, we first formulate this task as a Markov Decision Process and approach it using a multi-task reinforcement learning framework. Specifically, we introduce RLMiner, a novel framework that integrates reinforcement learning with our proposed task-state-aware Graph Neural Network to find the most frequent induced subgraph of size $k$ with a time complexity linear to $k$. Extensive experiments on real-world datasets demonstrate that our proposed RLMiner effectively identifies subgraphs with frequencies closely matching the ground-truth most frequent induced subgraphs, while achieving significantly shorter and more stable running times compared to traditional methods.
翻译:识别目标图中最高频的k阶诱导子图是图挖掘领域的基础性问题,对网络数据挖掘和社交网络分析具有直接意义。尽管该问题至关重要,但由于子图计数任务本身具有NP难特性,寻找最高频诱导子图的计算代价仍然十分高昂。传统的精确枚举算法通常具有较高的时间复杂度,尤其在子图规模k较大时更为显著。为缓解此问题,现有方法常利用具有向下封闭性的频率度量来缩减搜索空间,但这会对任务施加额外约束。本文首次将该任务建模为马尔可夫决策过程,并采用多任务强化学习框架进行求解。具体而言,我们提出了RLMiner这一创新框架,该框架将强化学习与我们提出的任务状态感知图神经网络相结合,能以与k呈线性关系的时间复杂度寻找最高频的k阶诱导子图。在真实数据集上的大量实验表明,相较于传统方法,我们提出的RLMiner不仅能有效识别出频率与真实最高频诱导子图高度接近的子图,同时具有更短且更稳定的运行时间。