Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a ``reasoning-aware'' diagnosis framework that rationalizes the diagnostic process via prompt-based learning in a time- and labor-efficient manner, and learns to reason over the prompt-generated rationales. Specifically, we address the clinical reasoning for disease diagnosis, where the LLM generates diagnostic rationales providing its insight on presented patient data and the reasoning path towards the diagnosis, namely Clinical Chain-of-Thought (Clinical CoT). We empirically demonstrate LLMs/LMs' ability of clinical reasoning via extensive experiments and analyses on both rationale generation and disease diagnosis in various settings. We further propose a novel set of criteria for evaluating machine-generated rationales' potential for real-world clinical settings, facilitating and benefiting future research in this area.
翻译:机器推理近年来凭借大型语言模型取得了显著进展。然而在临床领域,大多数自然语言处理驱动的项目主要聚焦于临床分类或阅读理解,由于需要临床医生进行昂贵的推理依据标注,对疾病诊断的临床推理探索不足。本研究提出一个"推理感知"诊断框架,通过基于提示学习的方式以省时省力的方式将诊断过程理性化,并学习对提示生成的推理依据进行推理。具体而言,我们针对疾病诊断中的临床推理问题展开研究,让大型语言模型生成诊断推理依据,阐明其对患者数据的见解及通往诊断的推理路径,即临床思维链。通过在不同场景下对推理依据生成和疾病诊断进行的大量实验与分析,我们实证证明了大型语言模型/语言模型具备临床推理能力。我们进一步提出一套评估机器生成推理依据在真实临床场景中应用潜力的创新标准体系,以促进和惠及该领域的未来研究。