Large Language Models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and the reasoning over specialized knowledge. To address these obstinate issues, we propose a novel Multi-disciplinary Collaboration (MC) framework for the medical domain that leverages role-playing LLM-based agents who participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free and interpretable framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work particularly focuses on the zero-shot scenario, our results on nine data sets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MC framework excels at mining and harnessing the medical expertise in LLMs, as well as extending its reasoning abilities. Based on these outcomes, we further conduct a human evaluation to pinpoint and categorize common errors within our method, as well as ablation studies aimed at understanding the impact of various factors on overall performance. Our code can be found at \url{https://github.com/gersteinlab/MedAgents}.
翻译:大语言模型(LLMs)尽管在多个通用领域取得了显著进展,但在医学和医疗领域仍面临重大障碍。该领域存在独特挑战,例如专业领域术语与基于专业知识的推理。为解决这些难题,我们提出了一种面向医学领域的全新多学科协作(MC)框架。该框架利用基于角色扮演的大语言模型智能体,通过参与协作式多轮讨论,从而提升大语言模型的推理能力与专业水平。这种无需训练且可解释的框架包含五个关键步骤:集合领域专家、提出各自分析、汇总分析形成报告、迭代讨论直至达成共识,以及最终做出决策。我们的工作尤其聚焦于零样本场景。在九个数据集(MedQA、MedMCQA、PubMedQA及MMLU的六个子任务)上的实验结果表明,我们提出的MC框架擅长挖掘与利用大语言模型中的医学专业知识,并扩展其推理能力。基于这些结果,我们进一步开展人工评估,以识别并分类本方法中的常见错误,同时进行消融实验以理解不同因素对整体性能的影响。代码可于\url{https://github.com/gersteinlab/MedAgents}获取。