Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at https://github.com/ruili33/SNS.
翻译:文本属性图(TAGs)给大语言模型(LLMs)的直接处理带来了独特挑战,然而LLMs丰富的常识知识和强大的推理能力为TAGs中的节点分类提供了巨大潜力。该领域的先前研究一直受到过挤压、异质性和无效图信息整合等问题的困扰,加之数据集划分不一致以及先进LLMs利用不足等问题进一步加剧了这些挑战。针对上述问题,我们提出了基于相似性的邻居选择方法(SNS)。通过使用SimCSE和先进的邻居选择技术,SNS有效提升了选定邻居的质量,从而改善图表示并缓解过挤压和异质性等问题。此外,作为一种归纳式且无需训练的方法,SNS相比传统GNN方法展现出更优的泛化性和可扩展性。我们遵循标准数据集划分实践开展的全面实验表明,SNS通过与LLMs进行简单的提示交互,在节点分类任务中始终优于普通GNN,并在PubMed等数据集上取得了最先进的结果,充分展现了LLMs在图结构理解方面的潜力。本研究进一步强调了图结构整合在LLM应用中的重要性,并识别出LLMs在节点分类中成功的关键因素。代码已开源至https://github.com/ruili33/SNS。