In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature, blood pressure, ECG) and imaging results (e.g., MRI, X-ray) from a measurement agent to mimic the real-world diagnostic process. Additionally, we incorporate self improvement mechanisms that allow models to iteratively refine their diagnostic strategies. We enhance LLM performance in our simulated setting by integrating multi-agent discussions, chain-of-thought reasoning, and experience-based knowledge retrieval, facilitating progressive learning as doctor agents interact with more patients. We also introduce an evaluation benchmark for assessing the LLM's ability to engage in dynamic, context-aware diagnostic interactions. While MedAgentSim is fully automated, it also supports a user-controlled mode, enabling human interaction with either the doctor or patient agent. Comprehensive evaluations in various simulated diagnostic scenarios demonstrate the effectiveness of our approach. Our code, simulation tool, and benchmark are available at \href{https://medagentsim.netlify.app/}.
翻译:本研究提出MedAgentSim,这是一个开源的模拟临床环境,包含医生、患者和检测智能体,旨在评估和提升大型语言模型在动态诊断场景中的表现。与先前方法不同,我们的框架要求医生智能体通过多轮对话主动与患者互动,并向检测智能体请求相关医学检查(如体温、血压、心电图)和影像结果(如磁共振成像、X光),以模拟真实世界的诊断流程。此外,我们引入了自改进机制,使模型能够迭代优化其诊断策略。通过整合多智能体讨论、思维链推理和基于经验的知识检索,我们提升了大型语言模型在模拟环境中的表现,促进医生智能体在与更多患者交互过程中实现渐进式学习。我们还提出了一个评估基准,用于衡量大型语言模型进行动态、上下文感知诊断交互的能力。虽然MedAgentSim完全自动化运行,但它也支持用户控制模式,允许人类与医生或患者智能体进行交互。在各种模拟诊断场景中的综合评估证明了我们方法的有效性。我们的代码、模拟工具和基准测试集可通过\href{https://medagentsim.netlify.app/}获取。