Clinical diagnosis is a gradual process of evidence integration, in which physicians move from symptoms and medical history to examinations, competing hypotheses, disease relations, and treatment decisions. Large language models have advanced medical text understanding and generation. Yet their clinical use remains limited by weak evidence grounding, opaque reasoning, and inconsistent links among differential diagnosis, final diagnosis, diagnostic basis, and treatment planning. We introduce MedCollab, a multi-agent framework for full-cycle clinical diagnosis and report generation. MedCollab coordinates specialist and examination agents according to patient records. It structures agent deliberation with an Issue-Based Information System (IBIS) protocol, so that each diagnostic position is supported by patient-specific evidence and medical knowledge. It also builds Hierarchical Disease Relation Chains (HDRC) to connect accepted hypotheses through progression, complication, and comorbidity relations. During multi-round deliberation, a verifier-guided consensus module evaluates evidence support, medical plausibility, and logical conflicts. It then adjusts agent contributions and filters unsupported reasoning. Experiments on ClinicalBench and MIMIC-IV show that MedCollab outperforms leading LLMs and medical multi-agent baselines in diagnostic accuracy, evidence consistency, and clinical reasoning quality. These results indicate that structured and auditable collaboration can produce more faithful and clinically coherent diagnostic reports.
翻译:[translated abstract in Chinese]
临床诊断是一个逐步整合证据的过程,医生从症状和病史出发,依次涉及检查、竞争性假设、疾病关系以及治疗决策。大语言模型在医学文本理解与生成方面取得了进展,但其临床应用仍受限于证据支撑薄弱、推理过程不透明,以及鉴别诊断、最终诊断、诊断依据与治疗方案之间缺乏连贯一致的关联。我们提出MedCollab——一个面向全周期临床诊断与报告生成的多智能体框架。MedCollab根据患者记录协调专科与检查智能体,并采用基于问题的信息系统(IBIS)协议结构化智能体间的讨论,确保每个诊断立场均有患者特异性证据与医学知识支撑。该框架同时构建分层疾病关系链(HDRC),通过进展、并发症和共病关系连接已采纳的假设。在多轮讨论中,验证器引导的共识模块评估证据支撑度、医理合理性及逻辑冲突,进而调整智能体的贡献并过滤无依据的推理。在ClinicalBench与MIMIC-IV上的实验表明,MedCollab在诊断准确性、证据一致性及临床推理质量上均优于领先的大语言模型及医学多智能体基线。这些结果表明,结构化与可审计的协作能够生成更可信且临床一致性更强的诊断报告。