Link prediction task is vital to automatically understanding the structure of large knowledge bases. In this paper, we present our system to solve this task at the Data Science and Advanced Analytics 2023 Competition "Efficient and Effective Link Prediction" (DSAA-2023 Competition) with a corpus containing 948,233 training and 238,265 for public testing. This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task. Drawing inspiration from recent advancements in natural language processing and understanding, we cast link prediction as an NLI task, wherein the presence of a link between two articles is treated as a premise, and the task is to determine whether this premise holds based on the information presented in the articles. We implemented our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task. Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively. Our team UIT-NLP ranked 3rd in performance on the private test set, equal to the scores of the first and second places. Our code is publicly for research purposes.
翻译:链接预测任务对于自动理解大型知识库的结构至关重要。本文介绍了我们在数据科学与高级分析2023竞赛"高效能链接预测"(DSAA-2023竞赛)中解决该任务的系统,所用语料库包含948,233条训练样本和238,265条公开测试样本。我们提出了一种将维基百科文章链接预测重构为自然语言推理(NLI)任务的方法。借鉴自然语言处理与理解领域的最新进展,我们将链接预测转化为NLI任务:将两篇文章之间的链接存在性视为前提,通过文章呈现的信息判断该前提是否成立。基于"维基百科文章链接预测的句子对分类"方法实现了系统。该系统在公开测试集和私有测试集上分别取得0.99996的宏F1分数和1.00000的宏F1分数。我们团队UIT-NLP在私有测试集上排名第三,与第一、二名得分持平。相关代码已公开供研究使用。