Methodological advancements to automate the generation of differential diagnosis (DDx) to predict a list of potential diseases as differentials given patients' symptom descriptions are critical to clinical reasoning and applications such as decision support. However, providing reasoning or interpretation for these differential diagnoses is more meaningful. Fortunately, large language models (LLMs) possess powerful language processing abilities and have been proven effective in various related tasks. Motivated by this potential, we investigate the use of LLMs for interpretable DDx. First, we develop a new DDx dataset with expert-derived interpretation on 570 public clinical notes. Second, we propose a novel framework, named Dual-Inf, that enables LLMs to conduct bidirectional inference for interpretation. Both human and automated evaluation demonstrate the effectiveness of Dual-Inf in predicting differentials and diagnosis explanations. Specifically, the performance improvement of Dual-Inf over the baseline methods exceeds 32% w.r.t. BERTScore in DDx interpretation. Furthermore, experiments verify that Dual-Inf (1) makes fewer errors in interpretation, (2) has great generalizability, (3) is promising for rare disease diagnosis and explanation.
翻译:自动化生成鉴别诊断(DDx)的方法学进展,即根据患者症状描述预测潜在疾病列表作为鉴别项,对于临床推理和决策支持等应用至关重要。然而,为这些鉴别诊断提供推理或解释则更具意义。幸运的是,大语言模型(LLMs)拥有强大的语言处理能力,并已在多项相关任务中被证明有效。受此潜力启发,我们研究了利用LLMs进行可解释DDx的方法。首先,我们基于570份公开临床记录,开发了一个包含专家推导解释的新DDx数据集。其次,我们提出了一个名为Dual-Inf的新颖框架,使LLMs能够进行双向推理以生成解释。人工评估和自动评估均证明了Dual-Inf在预测鉴别项和诊断解释方面的有效性。具体而言,在DDx解释任务中,Dual-Inf相较于基线方法的性能提升在BERTScore指标上超过32%。此外,实验验证了Dual-Inf具有以下特点:(1)在解释中犯错更少,(2)具有良好的泛化能力,(3)在罕见病诊断与解释方面前景广阔。