This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. Firstly, we postulate that a good uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 334 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.
翻译:本文探讨了不确定性量化(UQ)作为评估自动化深度学习(DL)工具可信度的指标,研究对象为多发性硬化(MS)患者磁共振成像(MRI)扫描中的白质病变(WML)分割任务。本研究聚焦于结构化输出分割任务中不确定性的两个核心方面。首先,我们提出一个良好的不确定性度量应能通过高不确定性值指示可能错误的预测。其次,我们研究了在不同解剖尺度(体素、病灶、患者)上量化不确定性的价值,假设各尺度的不确定性与特定类型的错误相关。本研究旨在通过分别针对域内和域外场景开展分析来验证这一关系。我们的主要方法贡献包括:(i)开发了基于结构预测差异的病灶级和患者级不确定性新型量化指标;(ii)扩展了错误留存曲线分析框架,以支持在病灶和患者尺度上评估UQ性能。基于包含334名患者的多中心MRI数据集结果表明,与平均体素尺度不确定性值的传统方法相比,我们提出的指标能更有效地捕捉病灶和患者尺度的模型错误。我们已将UQ协议代码开源至https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs。