Question Answering for Electronic Health Records: A Scoping Review of datasets and models

Question Answering (QA) systems on patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Significant amounts of patient data are stored in Electronic Health Records (EHRs), making EHR QA an important research area. In EHR QA, the answer is obtained from the medical record of the patient. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that employ medical websites or scientific papers to retrieve answers, making it critical to research EHR question answering. This study aimed to provide a methodological review of existing works on QA over EHRs. We searched for articles from January 1st, 2005 to September 30th, 2023 in four digital sources including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed to collect relevant publications on EHR QA. 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained a total of 47 papers for further study. Out of the 47 papers, 25 papers were about EHR QA datasets, and 37 papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. Also, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models.

翻译：针对患者数据的问答系统能够辅助临床医生和患者。例如，它们可以帮助临床医生进行决策，并使患者更好地了解其病史。大量患者数据存储在电子健康记录中，这使得电子健康记录问答成为一个重要的研究领域。在电子健康记录问答中，答案从患者医疗记录中获取。由于数据格式和模态的差异，这与利用医疗网站或科学论文检索答案的其他医疗问答任务截然不同，因此对电子健康记录问答进行研究至关重要。本研究旨在对现有电子健康记录问答相关工作提供方法学综述。我们检索了2005年1月1日至2023年9月30日期间Google Scholar、ACL Anthology、ACM Digital Library和PubMed四个数字资源库中的文章，以收集关于电子健康记录问答的相关出版物。共识别出4111篇论文用于研究，经纳入标准筛选后，最终获得47篇论文进行深入研究。在47篇论文中，25篇涉及电子健康记录问答数据集，37篇涉及电子健康记录问答模型。研究发现，电子健康记录问答相对新颖且探索不足，大多数工作都是近期开展的。此外，emrQA在引用量和使用频率上均为目前最流行的电子健康记录问答数据集。同时，我们识别了电子健康记录问答中使用的不同模型及其评估指标。