This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. First, we postulate that a reliable uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 444 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.
翻译:本文探讨了不确定性量化作为评估自动化深度学习工具在多发性硬化症患者磁共振成像扫描中白质病变分割可信度指标的应用。我们的研究聚焦于结构化输出分割任务中不确定性的两个核心方面。首先,我们假设可靠的不确定性度量应当通过高不确定值来指示可能错误的预测。其次,我们研究了在不同解剖尺度(体素、病灶或患者层面)量化不确定性的价值。我们假设每个尺度的不确定性都与特定类型的误差相关。本研究通过对域内和域外设置分别进行分析,旨在验证这种关联性。我们的主要方法学贡献包括:(1)开发了基于结构预测差异的新型病灶和患者尺度不确定性量化指标;(2)扩展了误差保留曲线分析框架,以促进在病灶和患者尺度上评估UQ性能。基于444例患者的多中心MRI数据集结果表明,与平均体素尺度不确定值的度量方法相比,我们提出的指标能更有效地捕捉病灶和患者尺度的模型误差。相关UQ协议代码已发布于https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs。