Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on artificial intelligence (AI). In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of a state-of-the-art backbone AI for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities.
翻译:医学图像分割的深度学习模型可能在处理病理案例或与训练图像来自不同中心的图像时,出现意外且严重的失败,产生违背专家知识的标注错误。这类错误削弱了深度学习模型在医学图像分割中的可信度。检测并纠正此类失败机制对于安全地将该技术转化为临床应用至关重要,并且很可能成为未来人工智能监管要求的组成部分。本研究提出一种可信人工智能理论框架及实用系统,该系统可通过基于Dempster-Shafer理论的后备方法与故障安全机制,增强任意基础人工智能系统。我们的方法基于可操作的可信人工智能定义,自动丢弃基础人工智能预测中违反专家知识的体素级标注,并对这些体素启用后备方案。我们在迄今规模最大的胎儿MRI标注数据集(包含来自13个中心的540例手动标注胎儿脑部3D T2w MRI)上验证了所提可信人工智能方法的有效性。该方法提升了基础人工智能系统在跨中心采集的胎儿脑部MRI及存在各种脑部异常的胎儿数据上的鲁棒性。