A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy

To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients were collected, including one 'common' image domain and five 'uncommon' domains. Segmentation models were tested on the benchmark dataset for an initial evaluation of model capacity and limitations. An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features. Another Variational Autoencoder (VAE) was also trained to estimate the shape quality of the auto-segmentation results. Using the extracted features from the image/segmentation pair as inputs, a regression model was trained to predict the per-patient segmentation accuracy, measured by Dice coefficient similarity (DSC). The framework was tested across 19 segmentation models to evaluate the generalizability of the entire framework. As results, the predicted DSC of regression models achieved a mean absolute error (MAE) ranging from 0.036 to 0.046 with an averaged MAE of 0.041. When tested on the benchmark dataset, the performances of all segmentation models were not significantly affected by scanning parameters: FOV, slice thickness and reconstructions kernels. For input images with Poisson noise, CNN-based segmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41, while the transformer-based model was not significantly affected.

翻译：为安全地将深度学习模型应用于临床，需要建立一套质量保证框架，用于在无真实轮廓标注的情况下，对输入域偏移及模型性能进行常规或持续监测。本研究以心脏亚结构分割为例，构建了一套质量保证框架。收集了包含241名患者CT图像及人工心脏勾画的基准数据集，涵盖一个"常见"图像域和五个"罕见"图像域。通过基准数据集对分割模型进行测试，初步评估模型能力与局限性。利用训练好的去噪自编码器（DAE）与两个手工设计特征，开发了图像域偏移检测器。另训练了变分自编码器（VAE）用于评估自动分割结果的形状质量。将图像/分割对提取的特征作为输入，训练回归模型预测患者个体分割精度（以Dice相似系数DSC衡量）。该框架在19种分割模型上测试，评估整体泛化能力。结果显示，回归模型预测的DSC平均绝对误差（MAE）范围为0.036至0.046，平均MAE为0.041。在基准数据集测试中，所有分割模型的性能均未受扫描参数（视野、层厚、重建核）显著影响。对于含泊松噪声的输入图像，基于CNN的分割模型DSC下降0.07至0.41，而基于Transformer的模型未受显著影响。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

专知会员服务

53+阅读 · 2020年4月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日