Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations

Multidisciplinary team (MDT) consultations are the gold standard for cancer care decision-making, yet current practice lacks structured mechanisms for quantifying consensus and ensuring decision traceability. We introduce a Multi-Agent Medical Decision Consensus Matrix System that deploys seven specialized large language model agents, including an oncologist, a radiologist, a nurse, a psychologist, a patient advocate, a nutritionist and a rehabilitation therapist, to simulate realistic MDT workflows. The framework incorporates a mathematically grounded consensus matrix that uses Kendall's coefficient of concordance to objectively assess agreement. To further enhance treatment recommendation quality and consensus efficiency, the system integrates reinforcement learning methods, including Q-Learning, PPO and DQN. Evaluation across five medical benchmarks (MedQA, PubMedQA, DDXPlus, MedBullets and SymCat) shows substantial gains over existing approaches, achieving an average accuracy of 87.5% compared with 83.8% for the strongest baseline, a consensus achievement rate of 89.3% and a mean Kendall's W of 0.823. Expert reviewers rated the clinical appropriateness of system outputs at 8.9/10. The system guarantees full evidence traceability through mandatory citations of clinical guidelines and peer-reviewed literature, following GRADE principles. This work advances medical AI by providing structured consensus measurement, role-specialized multi-agent collaboration and evidence-based explainability to improve the quality and efficiency of clinical decision-making.

翻译：多学科团队（MDT）会诊是癌症诊疗决策的金标准，然而当前实践缺乏量化共识和确保决策可追溯性的结构化机制。本文提出一种多智能体医疗决策共识矩阵系统，部署了七个专业化的大型语言模型智能体，包括肿瘤科医生、放射科医生、护士、心理医生、患者权益代表、营养师和康复治疗师，以模拟真实的多学科会诊工作流程。该框架采用基于数学原理的共识矩阵，利用肯德尔一致性系数客观评估共识程度。为进一步提升治疗建议质量与共识效率，系统整合了包括Q学习、近端策略优化和深度Q网络在内的强化学习方法。在五个医学基准（MedQA、PubMedQA、DDXPlus、MedBullets和SymCat）上的评估显示，相较于现有方法取得显著提升：平均准确率达87.5%（最强基线为83.8%），共识达成率为89.3%，平均肯德尔W系数为0.823。专家评审对系统输出的临床适宜性评分为8.9/10。系统遵循GRADE原则，通过强制引用临床指南与同行评议文献，确保完整的证据可追溯性。本研究通过提供结构化共识度量、角色专业化的多智能体协作及循证可解释性，提升了临床决策的质量与效率，推动了医疗人工智能的发展。