NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports

How can we interpret and retrieve medical evidence to support clinical decisions? Clinical trial reports (CTR) amassed over the years contain indispensable information for the development of personalized medicine. However, it is practically infeasible to manually inspect over 400,000+ clinical trial reports in order to find the best evidence for experimental treatments. Natural Language Inference (NLI) offers a potential solution to this problem, by allowing the scalable computation of textual entailment. However, existing NLI models perform poorly on biomedical corpora, and previously published datasets fail to capture the full complexity of inference over CTRs. In this work, we present a novel resource to advance research on NLI for reasoning on CTRs. The resource includes two main tasks. Firstly, to determine the inference relation between a natural language statement, and a CTR. Secondly, to retrieve supporting facts to justify the predicted relation. We provide NLI4CT, a corpus of 2400 statements and CTRs, annotated for these tasks. Baselines on this corpus expose the limitations of existing NLI models, with 6 state-of-the-art NLI models achieving a maximum F1 score of 0.627. To the best of our knowledge, we are the first to design a task that covers the interpretation of full CTRs. To encourage further work on this challenging dataset, we make the corpus, competition leaderboard, website and code to replicate the baseline experiments available at: https://github.com/ai-systems/nli4ct

翻译：我们如何解释和检索医学证据以支持临床决策？多年来积累的临床试验报告中包含了发展个性化医疗不可或缺的信息。然而，为了找到实验性治疗的最佳证据，人工检查超过40万份临床试验报告实际上是不可行的。自然语言推理通过实现文本蕴含的可扩展计算，为这一问题提供了潜在的解决方案。然而，现有的自然语言推理模型在生物医学语料库上表现不佳，且先前发布的数据集未能捕捉到对临床试验报告进行推理的全部复杂性。在这项工作中，我们提出了一种新的资源，以推进面向临床试验报告推理的自然语言推理研究。该资源包括两个主要任务：首先，确定自然语言陈述与临床试验报告之间的推理关系；其次，检索支持性事实以证明预测的关系。我们提供了NLI4CT，一个包含2400条陈述和临床试验报告的语料库，并为其标注了上述任务。在此语料库上的基线实验揭示了现有自然语言推理模型的局限性，6个最先进的自然语言推理模型取得了最高0.627的F1分数。据我们所知，我们是首个设计涵盖完整临床试验报告解释任务的研究团队。为鼓励在这一具有挑战性的数据集上开展进一步工作，我们提供了语料库、竞赛排行榜、网站以及用于复现基线实验的代码：https://github.com/ai-systems/nli4ct