Background: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can have multiple answers to a single question and multiple focus points in one question, which are lacking in the existing datasets for development of artificial intelligence solutions. Objective: Create a dataset for developing and evaluating clinical EQA systems that can handle natural multi-answer and multi-focus questions. Methods: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate an EQA dataset. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multi-answer and multi-focus QA entries, which represent more complex and natural challenges in addition to the basic one-drug-one-reason cases. A baseline solution was developed and tested on the dataset. Results: The derived RxWhyQA dataset contains 96,939 QA entries. Among the answerable questions, 25% require multiple answers, and 2% ask about multiple drugs within one question. There are frequent cues observed around the answers in the text, and 90% of the drug and reason terms occur within the same or an adjacent sentence. The baseline EQA solution achieved a best f1-measure of 0.72 on the entire dataset, and on specific subsets, it was: 0.93 on the unanswerable questions, 0.48 on single-drug questions versus 0.60 on multi-drug questions, 0.54 on the single-answer questions versus 0.43 on multi-answer questions. Discussion: The RxWhyQA dataset can be used to train and evaluate systems that need to handle multi-answer and multi-focus questions. Specifically, multi-answer EQA appears to be challenging and therefore warrants more investment in research.
翻译:背景:抽取式问答是自然语言处理领域的重要应用,通过定位临床病历中的答案来解答患者特定问题。真实临床场景中的抽取式问答可能涉及单问题多答案及单问题多焦点,但现有用于人工智能解决方案开发的数据集尚缺乏此类特征。目的:构建一个能够处理自然多答案与多焦点问题的临床抽取式问答系统开发与评估数据集。方法:利用2018年美国国家自然语言处理临床挑战赛的标注关系语料生成抽取式问答数据集。具体而言,纳入1对N、M对1及M对N的药物-病因关系,形成包含多答案与多焦点特征的问答条目,这些条目在基础单药单因案例之外呈现了更复杂、更自然的挑战。开发基线解决方案并在数据集上进行测试。结果:构建的RxWhyQA数据集包含96,939组问答条目。在可回答问题中,25%需多答案作答,2%的问题涉及单问多药。答案周围文本中频繁出现关联线索,90%的药物与病因术语出现在同一或相邻句子中。基线抽取式问答系统在整个数据集上取得最佳F1值为0.72,各子集表现分别为:不可回答问题0.93,单药问题0.48对比多药问题0.60,单答案问题0.54对比多答案问题0.43。讨论:RxWhyQA数据集可用于训练与评估需要处理多答案与多焦点问题的系统。值得注意的是,多答案抽取式问答具有显著挑战性,亟需更多研究投入。