Link prediction task is vital to automatically understanding the structure of large knowledge bases. In this paper, we present our system to solve this task at the Data Science and Advanced Analytics 2023 Competition "Efficient and Effective Link Prediction" (DSAA-2023 Competition) with a corpus containing 948,233 training and 238,265 for public testing. This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task. Drawing inspiration from recent advancements in natural language processing and understanding, we cast link prediction as an NLI task, wherein the presence of a link between two articles is treated as a premise, and the task is to determine whether this premise holds based on the information presented in the articles. We implemented our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task. Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively. Our team UIT-NLP ranked 3rd in performance on the private test set, equal to the scores of the first and second places. Our code is publicly for research purposes.
翻译:链接预测任务对于自动理解大型知识库的结构至关重要。本文介绍了我们在2023年数据科学与高级分析竞赛"高效有效的链接预测"(DSAA-2023竞赛)中解决该任务的系统,其语料库包含948,233个训练样本和238,265个公开测试样本。本文提出了一种将维基百科文章中的链接预测形式化为自然语言推理(NLI)任务的方法。受自然语言处理与理解领域最新进展的启发,我们将链接预测视为NLI任务:将两篇文章之间是否存在链接视为前提,任务则是基于文章呈现的信息判断该前提是否成立。我们基于"句子对分类用于维基百科文章链接预测"任务实现了该系统。该系统在公开测试集和私有测试集上分别取得了0.99996的宏F1分数和1.00000的宏F1分数。我们的团队UIT-NLP在私有测试集上排名第三,与第一名和第二名的得分持平。我们的代码已公开供研究使用。