Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) that helps address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes adapted to tasks of varying complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and medical diagnosis benchmarks. MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge and multi-modal reasoning, showing a significant improvement of up to 6.5% (p < 0.05) compared to previous methods' best performances. Ablation studies reveal that MDAgents effectively determines medical complexity to optimize for efficiency and accuracy across diverse medical tasks. Notably, the combination of moderator review and external medical knowledge in group collaboration resulted in an average accuracy improvement of 11.8%. Our code can be found at https://github.com/mitmedialab/MDAgents.
翻译:基础模型正逐渐成为医学领域的宝贵工具。尽管前景广阔,但在复杂的医疗任务中如何最佳地利用大语言模型(LLMs)仍是一个悬而未决的问题。本文提出了一种新颖的多智能体框架——医疗决策智能体(MDAgents),该框架通过自动为LLM团队分配协作结构来帮助解决这一难题。所分配的个体或群体协作结构可根据当前医疗任务进行定制,模拟适应不同复杂度任务的实际医疗决策流程。我们使用最先进的大语言模型,在一系列真实世界的医学知识和医疗诊断基准测试中评估了本框架及基线方法。在需要理解医学知识和进行多模态推理的任务中,MDAgents在十项基准测试的七项中取得了最佳性能,与先前方法的最佳性能相比,显示出高达6.5%的显著提升(p < 0.05)。消融研究表明,MDAgents能有效判定医疗任务复杂度,从而针对不同医疗任务优化效率与准确性。值得注意的是,在群体协作中结合审核机制与外部医学知识,实现了平均准确率11.8%的提升。我们的代码可在 https://github.com/mitmedialab/MDAgents 获取。