Link Prediction for Wikipedia Articles as a Natural Language Inference Task

Link prediction task is vital to automatically understanding the structure of large knowledge bases. In this paper, we present our system to solve this task at the Data Science and Advanced Analytics 2023 Competition "Efficient and Effective Link Prediction" (DSAA-2023 Competition) with a corpus containing 948,233 training and 238,265 for public testing. This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task. Drawing inspiration from recent advancements in natural language processing and understanding, we cast link prediction as an NLI task, wherein the presence of a link between two articles is treated as a premise, and the task is to determine whether this premise holds based on the information presented in the articles. We implemented our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task. Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively. Our team UIT-NLP ranked 3rd in performance on the private test set, equal to the scores of the first and second places. Our code is publicly for research purposes.

翻译：链接预测任务对于自动理解大型知识库的结构至关重要。本文介绍了我们在2023年数据科学与高级分析竞赛"高效有效的链接预测"（DSAA-2023竞赛）中解决该任务的系统，其语料库包含948,233个训练样本和238,265个公开测试样本。本文提出了一种将维基百科文章中的链接预测形式化为自然语言推理（NLI）任务的方法。受自然语言处理与理解领域最新进展的启发，我们将链接预测视为NLI任务：将两篇文章之间是否存在链接视为前提，任务则是基于文章呈现的信息判断该前提是否成立。我们基于"句子对分类用于维基百科文章链接预测"任务实现了该系统。该系统在公开测试集和私有测试集上分别取得了0.99996的宏F1分数和1.00000的宏F1分数。我们的团队UIT-NLP在私有测试集上排名第三，与第一名和第二名的得分持平。我们的代码已公开供研究使用。

相关内容

链路预测

关注 14

网络中的链路预测(Link Prediction)是指如何通过已知的网络节点以及网络结构等信息预测网络中尚未产生连边的两个节点之间产生链接的可能性。这种预测既包含了对未知链接（exist yet unknown links）的预测也包含了对未来链接（future links）的预测。该问题的研究在理论和应用两个方面都具有重要的意义和价值。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日