The growing reliance on Artificial Intelligence (AI) in critical domains such as healthcare demands robust mechanisms to ensure the trustworthiness of these systems, especially when faced with unexpected or anomalous inputs. This paper introduces the Open Medical Imaging Benchmarks for Out-Of-Distribution Detection (OpenMIBOOD), a comprehensive framework for evaluating out-of-distribution (OOD) detection methods specifically in medical imaging contexts. OpenMIBOOD includes three benchmarks from diverse medical domains, encompassing 14 datasets divided into covariate-shifted in-distribution, near-OOD, and far-OOD categories. We evaluate 24 post-hoc methods across these benchmarks, providing a standardized reference to advance the development and fair comparison of OOD detection methods. Results reveal that findings from broad-scale OOD benchmarks in natural image domains do not translate to medical applications, underscoring the critical need for such benchmarks in the medical field. By mitigating the risk of exposing AI models to inputs outside their training distribution, OpenMIBOOD aims to support the advancement of reliable and trustworthy AI systems in healthcare. The repository is available at https://github.com/remic-othr/OpenMIBOOD.
翻译:随着人工智能(AI)在医疗等关键领域日益普及,亟需建立稳健的机制以确保这些系统的可信度,特别是在面对意外或异常输入时。本文提出了用于分布外检测的开放医学影像基准(OpenMIBOOD),这是一个专门用于评估医学影像场景中分布外(OOD)检测方法的综合性框架。OpenMIBOOD包含来自不同医学领域的三个基准,涵盖14个数据集,这些数据集被划分为协变量偏移的分布内、近分布外和远分布外三类。我们使用24种后处理方法对这些基准进行了评估,为推进OOD检测方法的发展和公平比较提供了标准化参考。结果表明,自然图像领域大规模OOD基准的研究结论并不能直接迁移到医学应用中,这凸显了医学领域建立此类基准的迫切需求。通过降低AI模型暴露于训练分布之外输入的风险,OpenMIBOOD旨在支持医疗领域可靠且可信的AI系统的发展。该资源库可通过 https://github.com/remic-othr/OpenMIBOOD 访问。