Graph-based patterns are extensively employed and favored by practitioners within industrial companies due to their capacity to represent the behavioral attributes and topological relationships among users, thereby offering enhanced interpretability in comparison to black-box models commonly utilized for classification and recognition tasks. For instance, within the scenario of transaction risk management, a graph pattern that is characteristic of a particular risk category can be readily employed to discern transactions fraught with risk, delineate networks of criminal activity, or investigate the methodologies employed by fraudsters. Nonetheless, graph data in industrial settings is often characterized by its massive scale, encompassing data sets with millions or even billions of nodes, making the manual extraction of graph patterns not only labor-intensive but also necessitating specialized knowledge in particular domains of risk. Moreover, existing methodologies for mining graph patterns encounter significant obstacles when tasked with analyzing large-scale attributed graphs. In this work, we introduce GraphRPM, an industry-purpose parallel and distributed risk pattern mining framework on large attributed graphs. The framework incorporates a novel edge-involved graph isomorphism network alongside optimized operations for parallel graph computation, which collectively contribute to a considerable reduction in computational complexity and resource expenditure. Moreover, the intelligent filtration of efficacious risky graph patterns is facilitated by the proposed evaluation metrics. Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.
翻译:基于图的模式因其能够表征用户的行为属性与拓扑关系,相较于分类与识别任务中常用的黑盒模型具有更强的可解释性,在工业企业中被广泛采用并受到从业者青睐。例如,在交易风险管理场景中,表征特定风险类别的图模式可直接用于识别高风险交易、勾勒犯罪活动网络或探究欺诈者作案手法。然而,工业场景中的图数据通常规模庞大,包含数百万乃至数十亿节点,使得人工提取图模式不仅耗时费力,还需具备特定风险领域的专业知识。此外,现有图模式挖掘方法在分析大规模属性图时面临显著挑战。本研究提出GraphRPM——一种面向工业应用的大规模属性图并行分布式风险模式挖掘框架。该框架融合了新颖的边参与图同构网络与并行图计算优化操作,共同实现了计算复杂度与资源开销的大幅降低。同时,通过提出的评估指标实现了对有效风险图模式的智能筛选。在不同规模真实数据集上的全面实验评估证实,GraphRPM能够有效应对从大规模工业属性图中挖掘模式的核心挑战,彰显了其工业部署的重要价值。