Deep learning models for medical image segmentation and object detection are becoming increasingly available as clinical products. However, as details are rarely provided about the training data, models may unexpectedly fail when cases differ from those in the training distribution. An approach allowing potential users to independently test the robustness of a model, treating it as a black box and using only a few cases from their own site, is key for adoption. To address this, a method to test the robustness of these models against CT image quality variation is presented. In this work we present this framework by demonstrating that given the same training data, the model architecture and data pre processing greatly affect the robustness of several frequently used segmentation and object detection methods to simulated CT imaging artifacts and degradation. Our framework also addresses the concern about the sustainability of deep learning models in clinical use, by considering future shifts in image quality due to scanner deterioration or imaging protocol changes which are not reflected in a limited local test dataset.
翻译:用于医学图像分割和目标检测的深度学习模型正日益成为临床产品。然而,由于训练数据的细节很少被披露,当实际病例与训练分布存在差异时,模型可能会出现意外失效。一种允许潜在用户独立测试模型鲁棒性的方法——将模型视为黑盒,仅使用自身站点的少量病例进行测试——对于模型的实际应用至关重要。为此,本文提出了一种测试这些模型对CT图像质量变化鲁棒性的方法。本研究通过实验证明:在相同训练数据条件下,模型架构与数据预处理会显著影响多种常用分割与目标检测方法对模拟CT成像伪影及质量退化的鲁棒性。本框架同时关注深度学习模型在临床应用中可持续性的问题,考虑了未来因扫描设备老化或成像协议变更导致的图像质量变化——这些变化在有限的本地测试数据集中往往无法体现。