Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

Background: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can have multiple answers to a single question and multiple focus points in one question, which are lacking in the existing datasets for development of artificial intelligence solutions. Objective: Create a dataset for developing and evaluating clinical EQA systems that can handle natural multi-answer and multi-focus questions. Methods: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate an EQA dataset. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multi-answer and multi-focus QA entries, which represent more complex and natural challenges in addition to the basic one-drug-one-reason cases. A baseline solution was developed and tested on the dataset. Results: The derived RxWhyQA dataset contains 96,939 QA entries. Among the answerable questions, 25% require multiple answers, and 2% ask about multiple drugs within one question. There are frequent cues observed around the answers in the text, and 90% of the drug and reason terms occur within the same or an adjacent sentence. The baseline EQA solution achieved a best f1-measure of 0.72 on the entire dataset, and on specific subsets, it was: 0.93 on the unanswerable questions, 0.48 on single-drug questions versus 0.60 on multi-drug questions, 0.54 on the single-answer questions versus 0.43 on multi-answer questions. Discussion: The RxWhyQA dataset can be used to train and evaluate systems that need to handle multi-answer and multi-focus questions. Specifically, multi-answer EQA appears to be challenging and therefore warrants more investment in research.

翻译：背景：抽取式问答是自然语言处理领域的重要应用，通过定位临床病历中的答案来解答患者特定问题。真实临床场景中的抽取式问答可能涉及单问题多答案及单问题多焦点，但现有用于人工智能解决方案开发的数据集尚缺乏此类特征。目的：构建一个能够处理自然多答案与多焦点问题的临床抽取式问答系统开发与评估数据集。方法：利用2018年美国国家自然语言处理临床挑战赛的标注关系语料生成抽取式问答数据集。具体而言，纳入1对N、M对1及M对N的药物-病因关系，形成包含多答案与多焦点特征的问答条目，这些条目在基础单药单因案例之外呈现了更复杂、更自然的挑战。开发基线解决方案并在数据集上进行测试。结果：构建的RxWhyQA数据集包含96,939组问答条目。在可回答问题中，25%需多答案作答，2%的问题涉及单问多药。答案周围文本中频繁出现关联线索，90%的药物与病因术语出现在同一或相邻句子中。基线抽取式问答系统在整个数据集上取得最佳F1值为0.72，各子集表现分别为：不可回答问题0.93，单药问题0.48对比多药问题0.60，单答案问题0.54对比多答案问题0.43。讨论：RxWhyQA数据集可用于训练与评估需要处理多答案与多焦点问题的系统。值得注意的是，多答案抽取式问答具有显著挑战性，亟需更多研究投入。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日