Stereotactic radiosurgery (SRS) demands precise dose shaping around critical structures, yet black-box AI systems have limited clinical adoption due to opacity concerns. We tested whether chain-of-thought reasoning improves agentic planning in a retrospective cohort of 41 patients with brain metastases treated with 18 Gy single-fraction SRS. We developed SAGE (Secure Agent for Generative Dose Expertise), an LLM-based planning agent for automated SRS treatment planning. Two variants generated plans for each case: one using a non-reasoning model, one using a reasoning model. The reasoning variant showed comparable plan dosimetry relative to human planners on primary endpoints (PTV coverage, maximum dose, conformity index, gradient index; all p > 0.21) while reducing cochlear dose below human baselines (p = 0.022). When prompted to improve conformity, the reasoning model demonstrated systematic planning behaviors including prospective constraint verification (457 instances) and trade-off deliberation (609 instances), while the standard model exhibited none of these deliberative processes (0 and 7 instances, respectively). Content analysis revealed that constraint verification and causal explanation concentrated in the reasoning agent. The optimization traces serve as auditable logs, offering a path toward transparent automated planning.
翻译:立体定向放射外科(SRS)要求围绕关键结构进行精确的剂量塑形,然而黑盒人工智能系统因其不透明性而临床采纳有限。我们在一个回顾性队列(41例接受18 Gy单次分割SRS治疗的脑转移瘤患者)中测试了思维链推理是否能提升代理规划能力。我们开发了SAGE(安全生成剂量专业代理),一种基于大语言模型的自动SRS治疗规划代理。针对每个病例,两个变体生成了治疗计划:一个使用非推理模型,一个使用推理模型。在主要终点指标(靶区覆盖率、最大剂量、适形指数、梯度指数;所有p > 0.21)上,推理变体显示出与人工规划师相当的剂量学表现,同时将耳蜗剂量降至低于人工基准水平(p = 0.022)。当被提示改进适形性时,推理模型展现出系统性的规划行为,包括前瞻性约束验证(457次)和权衡考量(609次),而标准模型则未表现出任何此类审慎过程(分别为0次和7次)。内容分析表明,约束验证和因果解释主要集中在推理代理中。优化轨迹可作为可审计日志,为透明化自动规划提供了路径。