Many anomaly detection approaches, especially deep learning methods, have been recently developed to identify abnormal image morphology by only employing normal images during training. Unfortunately, many prior anomaly detection methods were optimized for a specific "known" abnormality (e.g., brain tumor, bone fraction, cell types). Moreover, even though only the normal images were used in the training process, the abnormal images were oftenly employed during the validation process (e.g., epoch selection, hyper-parameter tuning), which might leak the supposed ``unknown" abnormality unintentionally. In this study, we investigated these two essential aspects regarding universal anomaly detection in medical images by (1) comparing various anomaly detection methods across four medical datasets, (2) investigating the inevitable but often neglected issues on how to unbiasedly select the optimal anomaly detection model during the validation phase using only normal images, and (3) proposing a simple decision-level ensemble method to leverage the advantage of different kinds of anomaly detection without knowing the abnormality. The results of our experiments indicate that none of the evaluated methods consistently achieved the best performance across all datasets. Our proposed method enhanced the robustness of performance in general (average AUC 0.956).
翻译:许多异常检测方法(尤其是深度学习方法)近年来被开发出来,通过仅使用正常图像进行训练来识别异常图像形态。然而,许多先前的异常检测方法针对特定的“已知”异常(如脑肿瘤、骨折、细胞类型)进行了优化。此外,尽管训练过程中仅使用了正常图像,但在验证过程(如epoch选择、超参数调优)中常会使用异常图像,这可能无意中泄露了原本“未知”的异常信息。本研究针对医学图像通用异常检测的两个关键方面进行了探索:(1)在四个医学数据集上比较多种异常检测方法;(2)研究验证阶段如何仅使用正常图像无偏选择最优异常检测模型这一不可避免却常被忽视的问题;(3)提出一种简单的决策级集成方法,以在不了解异常类型的情况下利用不同异常检测方法的优势。实验结果表明,没有任何一种评估方法能在所有数据集上持续取得最佳性能。我们提出的方法整体提升了性能的鲁棒性(平均AUC达到0.956)。