DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address this gap, we introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs. Moreover, we conduct extensive benchmark experiments on DTGB, evaluating 7 popular dynamic graph learning algorithms and their variants of adapting to text attributes with LLM embeddings, along with 6 powerful large language models (LLMs). Our results show the limitations of existing models in handling DyTAGs. Our analysis also demonstrates the utility of DTGB in investigating the incorporation of structural and textual dynamics. The proposed DTGB fosters research on DyTAGs and their broad applications. It offers a comprehensive benchmark for evaluating and advancing models to handle the interplay between dynamic graph structures and natural language. The dataset and source code are available at https://github.com/zjs123/DTGB.

翻译：动态文本属性图（DyTAGs）在众多现实场景中普遍存在，其中每个节点和边都与文本描述相关联，且图结构和文本描述均随时间演变。尽管其应用广泛，但专门针对DyTAGs的基准数据集却显著缺乏，这阻碍了许多研究领域的潜在进展。为弥补这一空白，我们提出了动态文本属性图基准（DTGB），这是一个来自不同领域的大规模、时序演化图集合，其节点和边通过动态变化的文本属性和类别进行丰富。为促进DTGB的使用，我们基于四个现实用例设计了标准化评估流程：未来链接预测、目标节点检索、边分类以及文本关系生成。这些任务要求模型同时理解动态图结构和自然语言，凸显了DyTAGs带来的独特挑战。此外，我们在DTGB上进行了广泛的基准实验，评估了7种流行的动态图学习算法及其通过LLM嵌入适应文本属性的变体，以及6个强大的大语言模型（LLMs）。我们的结果表明现有模型在处理DyTAGs方面存在局限性。我们的分析也证明了DTGB在研究结构与文本动态融合方面的实用性。所提出的DTGB促进了针对DyTAGs及其广泛应用的研究，为评估和推进模型处理动态图结构与自然语言间相互作用提供了一个全面的基准。数据集和源代码可在https://github.com/zjs123/DTGB获取。