Lung cancer clinical decision support demands precise reasoning across complex, multi-stage oncological workflows. Existing multimodal large language models (MLLMs) fail to handle guideline-constrained staging and treatment reasoning. We formalize three oncological precision treatment (OPT) tasks for lung cancer, spanning TNM staging, treatment recommendation, and end-to-end clinical decision support. We introduce LungCURE, the first standardized multimodal benchmark built from 1,000 real-world, clinician-labeled cases across more than 10 hospitals. We further propose LCAgent, a multi-agent framework that ensures guideline-compliant lung cancer clinical decision-making by suppressing cascading reasoning errors across the clinical pathway. Experiments reveal large differences across various large language models (LLMs) in their capabilities for complex medical reasoning, when given precise treatment requirements. We further verify that LCAgent, as a simple yet effective plugin, enhances the reasoning performance of LLMs in real-world medical scenarios.
翻译:肺癌临床决策支持需要在复杂、多阶段的肿瘤诊疗流程中进行精确推理。现有的多模态大语言模型(MLLMs)难以处理受指南约束的分期和治疗推理任务。我们形式化定义了肺癌的三个肿瘤精准治疗(OPT)任务,涵盖TNM分期、治疗推荐和端到端临床决策支持。我们提出了LungCURE,这是首个基于1000例来自超过10家医院的真实世界、经临床医生标注的病例构建的标准化多模态基准。此外,我们提出了LCAgent,一种多智能体框架,通过抑制临床路径中的级联推理错误,确保符合指南规范的肺癌临床决策。实验表明,在给定精准治疗需求的条件下,各类大语言模型(LLMs)在复杂医学推理能力上存在显著差异。我们进一步验证了LCAgent作为一个简单而有效的插件,能够增强LLMs在真实医疗场景中的推理性能。