How can we interpret and retrieve medical evidence to support clinical decisions? Clinical trial reports (CTR) amassed over the years contain indispensable information for the development of personalized medicine. However, it is practically infeasible to manually inspect over 400,000+ clinical trial reports in order to find the best evidence for experimental treatments. Natural Language Inference (NLI) offers a potential solution to this problem, by allowing the scalable computation of textual entailment. However, existing NLI models perform poorly on biomedical corpora, and previously published datasets fail to capture the full complexity of inference over CTRs. In this work, we present a novel resource to advance research on NLI for reasoning on CTRs. The resource includes two main tasks. Firstly, to determine the inference relation between a natural language statement, and a CTR. Secondly, to retrieve supporting facts to justify the predicted relation. We provide NLI4CT, a corpus of 2400 statements and CTRs, annotated for these tasks. Baselines on this corpus expose the limitations of existing NLI models, with 6 state-of-the-art NLI models achieving a maximum F1 score of 0.627. To the best of our knowledge, we are the first to design a task that covers the interpretation of full CTRs. To encourage further work on this challenging dataset, we make the corpus, competition leaderboard, website and code to replicate the baseline experiments available at: https://github.com/ai-systems/nli4ct
翻译:如何解读和检索医学证据以支持临床决策?多年来积累的临床试验报告(CTR)包含了发展个性化医疗不可或缺的信息。然而,为了找到实验性治疗的最佳证据,手动检查超过40万份临床试验报告实际上是不可行的。自然语言推理(NLI)通过允许可扩展的文本蕴含计算,为此问题提供了潜在的解决方案。然而,现有的NLI模型在生物医学语料库上表现不佳,且先前发布的数据集未能捕捉到对CTR进行推理的全部复杂性。在这项工作中,我们提出了一种新颖的资源,以推进面向CTR推理的NLI研究。该资源包括两个主要任务:首先,确定自然语言陈述与CTR之间的推理关系;其次,检索支持事实以证明预测的关系。我们提供了NLI4CT,一个包含2400条陈述和CTR的语料库,并针对这些任务进行了标注。基于该语料库的基线模型揭示了现有NLI模型的局限性,其中6个最先进的NLI模型达到的最大F1分数仅为0.627。据我们所知,我们是首个设计涵盖完整CTR解读任务的研究团队。为了鼓励针对这一具有挑战性的数据集的进一步工作,我们在以下网址提供了语料库、竞赛排行榜、网站以及用于复现基线实验的代码:https://github.com/ai-systems/nli4ct