One of the major barriers to using large language models (LLMs) in medicine is the perception they use uninterpretable methods to make clinical decisions that are inherently different from the cognitive processes of clinicians. In this manuscript we develop novel diagnostic reasoning prompts to study whether LLMs can perform clinical reasoning to accurately form a diagnosis. We find that GPT4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can use clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether LLMs can be trusted for patient care. Novel prompting methods have the potential to expose the black box of LLMs, bringing them one step closer to safe and effective use in medicine.
翻译:在医学中使用大语言模型(LLMs)的主要障碍之一是认为它们使用与临床医生认知过程本质不同的不可解释方法来做出临床决策。本文开发了新颖的诊断推理提示,以研究LLMs能否通过执行临床推理来准确形成诊断。我们发现,GPT4能够被提示模仿临床医生常见的临床推理过程,同时不牺牲诊断准确性。这一发现具有重要意义,因为能够使用临床推理提供可解释理由的大语言模型,为医生提供了一种评估LLMs是否值得信赖用于患者护理的手段。新颖的提示方法有望揭示LLMs的黑箱,使其向安全有效地应用于医学迈进一步。