Graph representation learning methods are highly effective in handling complex non-Euclidean data by capturing intricate relationships and features within graph structures. However, traditional methods face challenges when dealing with heterogeneous graphs that contain various types of nodes and edges due to the diverse sources and complex nature of the data. Existing Heterogeneous Graph Neural Networks (HGNNs) have shown promising results but require prior knowledge of node and edge types and unified node feature formats, which limits their applicability. Recent advancements in graph representation learning using Large Language Models (LLMs) offer new solutions by integrating LLMs' data processing capabilities, enabling the alignment of various graph representations. Nevertheless, these methods often overlook heterogeneous graph data and require extensive preprocessing. To address these limitations, we propose a novel method that leverages the strengths of both LLM and GNN, allowing for the processing of graph data with any format and type of nodes and edges without the need for type information or special preprocessing. Our method employs LLM to automatically summarize and classify different data formats and types, aligns node features, and uses a specialized GNN for targeted learning, thus obtaining effective graph representations for downstream tasks. Theoretical analysis and experimental validation have demonstrated the effectiveness of our method.
翻译:图表示学习方法通过捕捉图结构中的复杂关系和特征,在处理非欧几里得数据方面表现出色。然而,传统方法在处理包含多种类型节点和边的异质图时面临挑战,这源于数据来源的多样性和本质的复杂性。现有的异质图神经网络(HGNNs)虽已展现出良好效果,但需要预先获知节点与边的类型信息以及统一的节点特征格式,这限制了其适用性。近期,利用大语言模型(LLMs)进行图表示学习的研究取得了进展,通过整合LLMs的数据处理能力,实现了对不同图表示形式的对齐,为这一问题提供了新的解决方案。然而,这些方法往往忽视了异质图数据,且需要大量的预处理工作。为克服这些局限,我们提出了一种新颖的方法,该方法结合了LLM与GNN的优势,能够处理任意格式与任意节点、边类型的图数据,而无需类型信息或特殊预处理。我们的方法利用LLM自动总结和分类不同的数据格式与类型,对齐节点特征,并采用专门的GNN进行针对性学习,从而为下游任务获取有效的图表示。理论分析与实验验证均证明了我们方法的有效性。