Large language models (LLMs) are increasingly used for diagnostic tasks in medicine. In clinical practice, the correct diagnosis can rarely be immediately inferred from the initial patient presentation alone. Rather, reaching a diagnosis often involves systematic history taking, during which clinicians reason over multiple potential conditions through iterative questioning to resolve uncertainty. This process requires considering differential diagnoses and actively excluding emergencies that demand immediate intervention. Yet, the ability of medical LLMs to generate informative follow-up questions and thus reason over differential diagnoses remains underexplored. Here, we introduce MedClarify, an AI agent for information-seeking that can generate follow-up questions for iterative reasoning to support diagnostic decision-making. Specifically, MedClarify computes a list of candidate diagnoses analogous to a differential diagnosis, and then proactively generates follow-up questions aimed at reducing diagnostic uncertainty. By selecting the question with the highest expected information gain, MedClarify enables targeted, uncertainty-aware reasoning to improve diagnostic performance. In our experiments, we first demonstrate the limitations of current LLMs in medical reasoning, which often yield multiple, similarly likely diagnoses, especially when patient cases are incomplete or relevant information for diagnosis is missing. We then show that our information-theoretic reasoning approach can generate effective follow-up questioning and thereby reduces diagnostic errors by ~27 percentage points (p.p.) compared to a standard single-shot LLM baseline. Altogether, MedClarify offers a path to improve medical LLMs through agentic information-seeking and to thus promote effective dialogues with medical LLMs that reflect the iterative and uncertain nature of real-world clinical reasoning.
翻译:大型语言模型(LLM)在医学诊断任务中的应用日益广泛。在临床实践中,仅凭初始的患者陈述很少能立即推断出正确的诊断。相反,达成诊断通常需要进行系统性的病史采集,在此过程中,临床医生通过迭代提问来推理多种潜在病症,以消除不确定性。这一过程需要考虑鉴别诊断,并主动排除需要立即干预的紧急情况。然而,医学LLM生成信息性后续问题从而进行鉴别诊断推理的能力仍未得到充分探索。本文介绍MedClarify,一种用于信息寻求的AI智能体,它能够生成用于迭代推理的后续问题,以支持诊断决策。具体而言,MedClarify计算一份类似于鉴别诊断的候选诊断列表,然后主动生成旨在减少诊断不确定性的后续问题。通过选择具有最高期望信息增益的问题,MedClarify实现了有针对性的、感知不确定性的推理,从而提升诊断性能。在我们的实验中,我们首先展示了当前LLM在医学推理中的局限性,它们常常产生多个可能性相似的诊断,尤其是在患者病例不完整或缺少诊断相关信息时。随后我们证明,我们的信息论推理方法能够生成有效的后续提问,从而与标准的单次LLM基线相比,将诊断错误率降低了约27个百分点。总之,MedClarify为通过智能体化的信息寻求来改进医学LLM提供了一条路径,从而促进与医学LLM的有效对话,这种对话反映了现实世界临床推理的迭代性和不确定性。