NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports

How can we interpret and retrieve medical evidence to support clinical decisions? Clinical trial reports (CTR) amassed over the years contain indispensable information for the development of personalized medicine. However, it is practically infeasible to manually inspect over 400,000+ clinical trial reports in order to find the best evidence for experimental treatments. Natural Language Inference (NLI) offers a potential solution to this problem, by allowing the scalable computation of textual entailment. However, existing NLI models perform poorly on biomedical corpora, and previously published datasets fail to capture the full complexity of inference over CTRs. In this work, we present a novel resource to advance research on NLI for reasoning on CTRs. The resource includes two main tasks. Firstly, to determine the inference relation between a natural language statement, and a CTR. Secondly, to retrieve supporting facts to justify the predicted relation. We provide NLI4CT, a corpus of 2400 statements and CTRs, annotated for these tasks. Baselines on this corpus expose the limitations of existing NLI models, with 6 state-of-the-art NLI models achieving a maximum F1 score of 0.627. To the best of our knowledge, we are the first to design a task that covers the interpretation of full CTRs. To encourage further work on this challenging dataset, we make the corpus, competition leaderboard, website and code to replicate the baseline experiments available at: https://github.com/ai-systems/nli4ct

翻译：如何解释和检索医学证据以支持临床决策？多年来积累的临床试验报告包含了个性化医疗发展不可或缺的信息。然而，为了寻找实验性治疗的最佳证据，手动检查超过40万份临床试验报告实际上是不可行的。自然语言推理通过允许可扩展的文本蕴含计算，为此问题提供了潜在的解决方案。然而，现有的自然语言推理模型在生物医学语料库上表现不佳，且先前发布的数据集未能捕捉到基于临床试验报告进行推理的全部复杂性。在本文中，我们提出了一种新颖的资源，以推动对临床试验报告推理的自然语言推理研究。该资源包含两项主要任务：首先，确定自然语言陈述与临床试验报告之间的推理关系；其次，检索支持性事实以证明所预测的关系。我们提供了NLI4CT语料库，包含2400条陈述和临床试验报告，并针对这些任务进行了标注。该语料库上的基线实验揭示了现有自然语言推理模型的局限性，6种最先进的自然语言推理模型的最大F1分数仅为0.627。据我们所知，我们是首个设计涵盖完整临床试验报告解释任务的研究团队。为鼓励在这一具有挑战性的数据集上的进一步工作，我们在以下网址提供语料库、竞赛排行榜、网站以及复现基线实验的代码：https://github.com/ai-systems/nli4ct