With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs. In this paper, we present OAG-Bench, a comprehensive, multi-aspect, and fine-grained human-curated benchmark based on the Open Academic Graph (OAG). OAG-Bench covers 10 tasks, 20 datasets, 70+ baselines, and 120+ experimental results to date. We propose new data annotation strategies for certain tasks and offer a suite of data pre-processing codes, algorithm implementations, and standardized evaluation protocols to facilitate academic graph mining. Extensive experiments reveal that even advanced algorithms like large language models (LLMs) encounter difficulties in addressing key challenges in certain tasks, such as paper source tracing and scholar profiling. We also introduce the Open Academic Graph Challenge (OAG-Challenge) to encourage community input and sharing. We envisage that OAG-Bench can serve as a common ground for the community to evaluate and compare algorithms in academic graph mining, thereby accelerating algorithm development and advancement in this field. OAG-Bench is accessible at https://www.aminer.cn/data/.
翻译:随着科学文献的快速激增,多样化的学术知识服务日益依赖于全面的学术图谱挖掘。尽管已有公开的学术图谱、基准和数据集,但这些资源往往在多方面和细粒度标注上存在不足,局限于特定任务类型和领域,或缺乏底层真实的学术图谱。本文提出了OAG-Bench,一个基于开放学术图谱(OAG)构建的全面、多维度、细粒度的人工精选基准。OAG-Bench迄今涵盖了10项任务、20个数据集、70多个基线模型以及120多项实验结果。我们针对特定任务提出了新的数据标注策略,并提供了一套数据预处理代码、算法实现和标准化评估协议,以促进学术图谱挖掘。大量实验表明,即使是像大语言模型(LLMs)这样的先进算法,在处理某些任务(如论文溯源和学者画像)中的关键挑战时也面临困难。我们还推出了开放学术图谱挑战赛(OAG-Challenge),以鼓励社区的贡献与分享。我们期望OAG-Bench能够作为社区评估和比较学术图谱挖掘算法的共同基础,从而加速该领域的算法发展与进步。OAG-Bench可通过 https://www.aminer.cn/data/ 访问。