Efficient inference for graph neural networks (GNNs) on large knowledge graphs (KGs) is essential for many real-world applications. GNN inference queries are computationally expensive and vary in complexity, as each involves a different number of target nodes linked to subgraphs of diverse densities and structures. Existing acceleration methods, such as pruning, quantization, and knowledge distillation, instantiate smaller models but do not adapt them to the structure or semantics of individual queries. They also store models as monolithic files that must be fully loaded, and miss the opportunity to retrieve only the neighboring nodes and corresponding model components that are semantically relevant to the target nodes. These limitations lead to excessive data loading and redundant computation on large KGs. This paper presents KG-WISE, a task-driven inference paradigm for large KGs. KG-WISE decomposes trained GNN models into fine-grained components that can be partially loaded based on the structure of the queried subgraph. It employs large language models (LLMs) to generate reusable query templates that extract semantically relevant subgraphs for each task, enabling query-aware and compact model instantiation. We evaluate KG-WISE on six large KGs with up to 42 million nodes and 166 million edges. KG-WISE achieves up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.
翻译:图神经网络(GNN)在大知识图谱上的高效推理对许多实际应用至关重要。GNN推理查询计算代价高昂且复杂度各异,因为每次查询涉及不同数量的目标节点,这些节点与不同密度和结构的子图相关联。现有的加速方法(如剪枝、量化和知识蒸馏)会实例化更小的模型,但未能使这些模型适应单个查询的结构或语义。这些方法还将模型存储为必须完整加载的单一文件,从而错失了仅检索与目标节点语义相关的邻居节点及对应模型组件的机会。这些局限性导致在大知识图谱上产生过多的数据加载和冗余计算。本文提出KG-WISE,一种面向大知识图谱的任务驱动推理范式。KG-WISE将训练好的GNN模型分解为细粒度组件,这些组件可根据查询子图的结构部分加载。它利用大型语言模型(LLM)生成可复用的查询模板,为每个任务提取语义相关的子图,从而实现查询感知且紧凑的模型实例化。我们在六个大知识图谱上评估了KG-WISE,这些图谱包含多达4200万个节点和1.66亿条边。与最先进系统相比,KG-WISE在商用和开源LLM上均能保持或提升精度,同时实现高达28倍的推理加速和98%的内存使用降低。