Graph representation learning methods are highly effective in handling complex non-Euclidean data by capturing intricate relationships and features within graph structures. However, traditional methods face challenges when dealing with heterogeneous graphs that contain various types of nodes and edges due to the diverse sources and complex nature of the data. Existing Heterogeneous Graph Neural Networks (HGNNs) have shown promising results but require prior knowledge of node and edge types and unified node feature formats, which limits their applicability. Recent advancements in graph representation learning using Large Language Models (LLMs) offer new solutions by integrating LLMs' data processing capabilities, enabling the alignment of various graph representations. Nevertheless, these methods often overlook heterogeneous graph data and require extensive preprocessing. To address these limitations, we propose a novel method that leverages the strengths of both LLM and GNN, allowing for the processing of graph data with any format and type of nodes and edges without the need for type information or special preprocessing. Our method employs LLM to automatically summarize and classify different data formats and types, aligns node features, and uses a specialized GNN for targeted learning, thus obtaining effective graph representations for downstream tasks. Theoretical analysis and experimental validation have demonstrated the effectiveness of our method.
翻译:图表示学习方法通过捕捉图结构内部的复杂关系和特征,在处理复杂非欧几里得数据方面表现出色。然而,传统方法在处理包含多种类型节点和边的异质图时面临挑战,这源于数据的多样来源和复杂性质。现有的异质图神经网络(HGNNs)已显示出有希望的结果,但需要节点和边类型的先验知识以及统一的节点特征格式,这限制了其适用性。近期利用大型语言模型(LLMs)进行图表示学习的进展提供了新的解决方案,通过整合LLMs的数据处理能力,实现了对各种图表示的对齐。然而,这些方法常常忽视异质图数据,并且需要大量的预处理。为了应对这些限制,我们提出了一种新颖的方法,该方法结合了LLM和GNN的优势,能够处理具有任意格式和任意类型节点与边的图数据,而无需类型信息或特殊预处理。我们的方法利用LLM自动总结和分类不同的数据格式与类型,对齐节点特征,并使用专门的GNN进行针对性学习,从而为下游任务获得有效的图表示。理论分析和实验验证已证明了我们方法的有效性。