To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient comparison. Overall, there lacks an empirical study that puts TGNN models onto the same ground and compares them comprehensively. To this end, we propose BenchTemp, a general benchmark for evaluating TGNN models on various workloads. BenchTemp provides a set of benchmark datasets so that different TGNN models can be fairly compared. Further, BenchTemp engineers a standard pipeline that unifies the TGNN evaluation. With BenchTemp, we extensively compare the representative TGNN models on different tasks (e.g., link prediction and node classification) and settings (transductive and inductive), w.r.t. both effectiveness and efficiency metrics. We have made BenchTemp publicly available at https://github.com/qianghuangwhu/benchtemp.
翻译:针对特征或连接随时间演化的图结构,学界已提出一系列时序图神经网络(TGNN)。尽管这些TGNN取得了成功,但既往评估在四个关键问题上存在局限:1)数据集不一致;2)评估流程不统一;3)缺乏任务多样性;4)缺乏高效对比。总体而言,目前尚缺乏将TGNN模型置于统一标准下进行全面比较的实证研究。为此,我们提出BenchTemp——一个针对不同任务负载评估TGNN模型的通用基准。BenchTemp提供标准化基准数据集,使不同TGNN模型得以公平比较。此外,BenchTemp设计了统一TGNN评估的标准流程。通过BenchTemp,我们分别在有效性指标与效率指标维度,对代表性TGNN模型在不同任务(如链接预测与节点分类)及设置(直推式与归纳式)下进行了广泛比较。BenchTemp已在https://github.com/qianghuangwhu/benchtemp 公开提供。