Sharing medical reports is essential for patient-centered care. A recent line of work has focused on automatically generating reports with NLP methods. However, different audiences have different purposes when writing/reading medical reports -- for example, healthcare professionals care more about pathology, whereas patients are more concerned with the diagnosis ("Is there any abnormality?"). The expectation gap results in a common situation where patients find their medical reports to be ambiguous and therefore unsure about the next steps. In this work, we explore the audience expectation gap in healthcare and summarize common ambiguities that lead patients to be confused about their diagnosis into three categories: medical jargon, contradictory findings, and misleading grammatical errors. Based on our analysis, we define a disambiguation rewriting task to regenerate an input to be unambiguous while preserving information about the original content. We further propose a rewriting algorithm based on contrastive pretraining and perturbation-based rewriting. In addition, we create two datasets, OpenI-Annotated based on chest reports and VA-Annotated based on general medical reports, with available binary labels for ambiguity and abnormality presence annotated by radiology specialists. Experimental results on these datasets show that our proposed algorithm effectively rewrites input sentences in a less ambiguous way with high content fidelity. Our code and annotated data are released to facilitate future research.
翻译:共享医疗报告对于以患者为中心的护理至关重要。近年来,研究重点转向利用自然语言处理方法自动生成报告。然而,不同受众在撰写/阅读医疗报告时目的各异——例如,医疗专业人员更关注病理学,而患者更关心诊断结果(“是否存在异常?”)。这种期望差距导致常见情况:患者发现其医疗报告存在歧义,因此对后续步骤感到不确定。在这项工作中,我们探讨了医疗领域的受众期望差距,并将导致患者对诊断困惑的常见歧义归纳为三类:医学术语、矛盾发现以及误导性语法错误。基于分析,我们定义了一项消歧改写任务,即在保留原始内容信息的同时,重新生成无歧义的输入。我们进一步提出了一种基于对比预训练和扰动改写的改写算法。此外,我们创建了两个数据集:基于胸部报告的OpenI-Annotated和基于通用医疗报告的VA-Annotated,这两个数据集均包含由放射学专家标注的针对歧义性和异常性的可用二分类标签。在这些数据集上的实验结果表明,我们提出的算法能够以高内容保真度有效将输入句子改写为歧义性较低的形式。我们已公开代码和标注数据,以促进未来研究。