Extractive question answering (QA) systems can enable physicians and researchers to query medical records, a foundational capability for designing clinical studies and understanding patient medical history. However, building these systems typically requires expert-annotated QA pairs. Large language models (LLMs), which can perform extractive QA, depend on high quality data in their prompts, specialized for the application domain. We introduce a novel approach, XAIQA, for generating synthetic QA pairs at scale from data naturally available in electronic health records. Our method uses the idea of a classification model explainer to generate questions and answers about medical concepts corresponding to medical codes. In an expert evaluation with two physicians, our method identifies $2.2\times$ more semantic matches and $3.8\times$ more clinical abbreviations than two popular approaches that use sentence transformers to create QA pairs. In an ML evaluation, adding our QA pairs improves performance of GPT-4 as an extractive QA model, including on difficult questions. In both the expert and ML evaluations, we examine trade-offs between our method and sentence transformers for QA pair generation depending on question difficulty.
翻译:抽取式问答系统能够使医生和研究人员查询医疗记录,这是设计临床研究和理解患者病史的基础能力。然而构建此类系统通常需要专家标注的问答对。能够执行抽取式问答的大语言模型,依赖于其提示中针对应用领域专业定制的高质量数据。我们提出了一种名为XAIQA的创新方法,用于从电子健康记录中自然可用的数据大规模生成合成问答对。该方法利用分类模型解释器的思想,生成与医学编码对应的医学概念相关问题及答案。在与两位医生进行的专家评估中,相比两种使用句子变换器创建问答对的流行方法,本方法识别出的语义匹配数量是前者的2.2倍,临床缩写识别量则是3.8倍。在机器学习评估中,将本方法生成的问答对加入后,提升了GPT-4作为抽取式问答模型的表现,尤其在处理困难问题时效果显著。在专家评估与机器学习评估中,我们均根据问题难度分析了本方法与句子变换器在生成问答对时的权衡关系。