This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. Firstly, we postulate that a good uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 172 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.
翻译:本文探讨了不确定性量化(UQ)作为自动深度学习(DL)工具可信度指标的应用,其背景是基于多发性硬化症(MS)患者磁共振成像(MRI)扫描的白质病变(WML)分割。我们的研究聚焦于结构化输出分割任务中不确定性的两个主要方面。首先,我们假设良好的不确定性度量应能通过高不确定性值指示可能错误的预测。其次,我们研究了在不同解剖尺度(体素、病灶或患者)量化不确定性的价值。我们假设每个尺度下的不确定性与特定类型的误差相关。本研究旨在通过分别对域内和域外场景进行独立分析来确认这种关系。我们的主要方法论贡献包括:(i)开发了基于结构预测差异的新型度量,用于量化病灶和患者尺度下的不确定性;(ii)扩展了误差保留曲线分析框架,以便在病灶和患者尺度上评估UQ性能。基于172名患者的多中心MRI数据集结果表明,与平均体素尺度不确定性值的度量相比,我们提出的度量能更有效地捕捉病灶和患者尺度上的模型误差。相关UQ协议代码已发布在https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs。