Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical deployment requires reliable cues indicating where models may be wrong. In this work, we propose a budget-aware uncertainty-driven quality assurance (QA) framework built on nnU-Net, combining uncertainty quantification and post-hoc calibration to produce voxel-wise uncertainty maps (based on predictive entropy) that can guide targeted manual review. We compare temperature scaling (TS), deep ensembles (DE), checkpoint ensembles (CE), and test-time augmentation (TTA), evaluated both individually and in combination on TMLI as a representative use case. Reliability is assessed through ROI-masked calibration metrics and uncertainty--error alignment under realistic revision constraints, summarized as AUC over the top 0-5% most uncertain voxels. Across configurations, segmentation accuracy remains stable, whereas TS substantially improves calibration. Uncertainty-error alignment improves most with calibrated checkpoint-based inference, leading to uncertainty maps that highlight more consistently regions requiring manual edits. Overall, integrating calibration with efficient ensembling seems a promising strategy to implement a budget-aware QA workflow for radiotherapy segmentation.
翻译:临床靶区(CTV)的精确勾画是放疗计划的关键,但这一过程耗时且难以评估,尤其在复杂治疗如全骨髓和淋巴结照射(TMLI)中更为突出。虽然基于深度学习的自动分割能减少工作量,但临床安全部署需要可靠的提示来标示模型可能出错的位置。本文提出一种基于nnU-Net的预算感知不确定性驱动质量保证(QA)框架,结合不确定性量化与后验校准,生成体素级不确定性图谱(基于预测熵),从而指导针对性人工复核。我们比较了温度缩放(TS)、深度集成(DE)、检查点集成(CE)和测试时增强(TTA)等方法,在TMLI典型用例中分别评估其单独及组合效果。通过ROI掩膜校准指标和现实修订约束下的不确定性-误差对齐来评估可靠性,并以最不确定的前0-5%体素对应的AUC作为汇总指标。各配置下分割精度保持稳定,而TS显著改善了校准效果。基于校准检查点的推理使不确定性-误差对齐提升最为显著,生成的不确定性图谱能更一致地突出需要人工编辑的区域。总体而言,将校准与高效集成相结合似乎是实现放疗分割预算感知QA工作流程的有前景策略。