Large language models (LLMs) have shown promise in clinical diagnosis but remain limited by unreliable report generation, weak evidence grounding, and opaque reasoning. We propose MedCollab, an IBIS-guided multi-agent framework for full-cycle clinical diagnosis and diagnostic report generation. Mimicking hospital consultation, MedCollab dynamically recruits specialist and exam agents from patient records. Each diagnostic hypothesis is structured through the Issue-Based Information System (IBIS) into evidence-linked arguments, improving traceability and auditability. MedCollab further constructs Hierarchical Disease Relation Chains (HDRC) to organize accepted hypotheses into clinically meaningful pathological and comorbidity relations. A verifier-guided consensus module audits reasoning quality, detects contradictions, and updates agent weights over multiple rounds. Experiments on ClinicalBench and MIMIC-IV show that MedCollab outperforms strong LLM and medical multi-agent baselines in diagnostic accuracy, department routing, evidence consistency, and report quality. These results demonstrate that structured argumentation and disease-relation modeling can improve the reliability, transparency, and clinical coherence of LLM-based diagnosis.
翻译:[translated abstract in Chinese]
大语言模型(LLMs)在临床诊断领域展现出潜力,但仍受限于不可靠的报告生成、薄弱的证据支撑以及不透明的推理过程。我们提出MedCollab,一个基于IBIS引导的多智能体框架,用于全周期临床诊断与诊断报告生成。通过模拟医院会诊流程,MedCollab根据患者病史动态招募专科医生与检验检测智能体。每个诊断假设均通过基于议题的信息系统(IBIS)结构化为关联证据的论点,从而提高可追溯性与可审计性。MedCollab进一步构建层级疾病关系链(HDRC),将已采纳的假设组织成具有临床意义的病理与共病关系体系。一个验证器引导的共识模块对推理质量进行审计,检测矛盾并跨多轮更新智能体权重。在ClinicalBench与MIMIC-IV上的实验表明,MedCollab在诊断准确性、科室分诊、证据一致性及报告质量方面均优于强大LLM及医学多智能体基线模型。这些结果表明,结构化论证与疾病关系建模可提升基于LLM的诊断系统的可靠性、透明度与临床连贯性。