Graphs with abundant attributes are essential in modeling interconnected entities and improving predictions in various real-world applications. Traditional Graph Neural Networks (GNNs), which are commonly used for modeling attributed graphs, need to be re-trained every time when applied to different graph tasks and datasets. Although the emergence of Large Language Models (LLMs) has introduced a new paradigm in natural language processing, the generative potential of LLMs in graph mining remains largely under-explored. To this end, we propose a novel framework MuseGraph, which seamlessly integrates the strengths of GNNs and LLMs and facilitates a more effective and generic approach for graph mining across different tasks and datasets. Specifically, we first introduce a compact graph description via the proposed adaptive input generation to encapsulate key information from the graph under the constraints of language token limitations. Then, we propose a diverse instruction generation mechanism, which distills the reasoning capabilities from LLMs (e.g., GPT-4) to create task-specific Chain-of-Thought-based instruction packages for different graph tasks. Finally, we propose a graph-aware instruction tuning with a dynamic instruction package allocation strategy across tasks and datasets, ensuring the effectiveness and generalization of the training process. Our experimental results demonstrate significant improvements in different graph tasks, showcasing the potential of our MuseGraph in enhancing the accuracy of graph-oriented downstream tasks while keeping the generation powers of LLMs.
翻译:具有丰富属性的图在建模互联实体和提升各类实际应用预测能力中至关重要。传统图神经网络(GNN)虽常用于建模属性图,但在处理不同图任务和数据集时需要重新训练。尽管大语言模型(LLM)的出现为自然语言处理带来了新范式,但其在图挖掘中的生成潜力仍未得到充分探索。为此,我们提出创新框架MuseGraph,深度融合GNN与LLM的优势,为跨任务和数据集的高效通用图挖掘提供新方案。具体而言,我们首先通过提出的自适应输入生成机制,在语言标记长度约束下将图的关键信息压缩为紧凑图描述;其次,设计多样化指令生成机制,从LLM(如GPT-4)中蒸馏推理能力,为不同图任务创建基于思维链的专用指令包;最后,提出图感知指令微调方法,通过跨任务与数据集的动态指令包分配策略,确保训练过程的有效性与泛化性。实验结果表明,该方法在多种图任务中取得显著提升,既保持了LLM的生成能力,又增强了图导向下游任务的准确性,展现了MuseGraph的广阔应用潜力。