Artificial intelligence is increasingly integrated into radiotherapy workflows, yet such pipelines remain vulnerable to out-of-distribution image data that may introduce unexpected behavior in clinical tasks. Deep learning-based anomaly detection for pelvic magnetic resonance imaging (MRI) remains largely unexplored, and transparent evaluation of its feasibility for full automation is limited. We developed and evaluated a fully automated, unsupervised anomaly-detection framework for pelvic and brain MRI. A two-stage framework was trained on reference images from public datasets: LUND-PROBE for pelvic MRI, and IXI, fastMRI, and fastMRI+ for brain MRI. In the first stage, MRI slices were compressed into discrete tokens; in the second, the distribution of normal tokens was modeled. Anomaly evidence was estimated by combining perceptual image differences with token-surprisal scores based on negative log-likelihood. Automated detection was evaluated on pelvic MRI with synthetic global and real clinical anomalies, and on brain MRI with clinically annotated fastMRI+ abnormalities. Sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and false-positive behavior in held-out normal cases were assessed. The framework achieved robust detection across hidden evaluation cohorts, with AUCs of 0.97 (95% CI, 0.95-0.98) and 0.81 (95% CI, 0.74-0.87) for pelvic and brain MRI, respectively. Heatmap analysis showed strong spatial agreement between detected anomalies and ground-truth locations, supporting localization accuracy and interpretability. These results support the potential of unsupervised anomaly detection as an automated MRI quality-control layer for radiotherapy workflows, with transparent visualization of image regions likely to compromise downstream AI-based tasks.
翻译:人工智能正日益融入放射治疗工作流,然而此类流程仍易受分布外图像数据影响,这些数据可能在临床任务中引发意外行为。针对盆腔磁共振成像的深度学习异常检测仍鲜有探索,且对其全自动化可行性的透明评估十分有限。我们开发并评估了一个用于盆腔及脑部磁共振成像的全自动化无监督异常检测框架。该两阶段框架基于公开数据集中的参考图像进行训练:盆腔MRI采用LUND-PROBE数据集,脑部MRI采用IXI、fastMRI及fastMRI+数据集。第一阶段将MRI切片压缩为离散令牌;第二阶段对正常令牌的分布进行建模。通过结合感知图像差异与基于负对数似然的令牌意外度评分,评估异常证据。自动检测在带有合成全局异常与真实临床异常的盆腔MRI,以及带有临床标注fastMRI+异常数据的脑部MRI上进行了评估。评估指标包括敏感性、特异性、接收者操作特征曲线下面积(AUC)及保留正常病例中的假阳性表现。该框架在隐藏评估队列中实现了稳健检测:盆腔MRI的AUC为0.97(95%置信区间:0.95-0.98),脑部MRI的AUC为0.81(95%置信区间:0.74-0.87)。热力图分析显示,检测到的异常与真实位置之间存在强空间一致性,支持定位准确性与可解释性。这些结果证实了无监督异常检测作为放射治疗工作流中自动化MRI质量控制层的潜力,能够透明可视化可能损害下游基于人工智能任务的图像区域。