The enhanced performance of AI has accelerated its integration into scientific research. In particular, the use of generative AI to create scientific hypotheses is promising and is increasingly being applied across various fields. However, when employing AI-generated hypotheses for critical decisions, such as medical diagnoses, verifying their reliability is crucial. In this study, we consider a medical diagnostic task using generated images by diffusion models, and propose a statistical test to quantify its reliability. The basic idea behind the proposed statistical test is to employ a selective inference framework, where we consider a statistical test conditional on the fact that the generated images are produced by a trained diffusion model. Using the proposed method, the statistical reliability of medical image diagnostic results can be quantified in the form of a p-value, allowing for decision-making with a controlled error rate. We show the theoretical validity of the proposed statistical test and its effectiveness through numerical experiments on synthetic and brain image datasets.
翻译:人工智能性能的提升加速了其在科学研究中的融合。特别是,利用生成式人工智能创建科学假设前景广阔,并正逐渐应用于各个领域。然而,当将人工智能生成的假设用于医疗诊断等关键决策时,验证其可靠性至关重要。在本研究中,我们考虑利用扩散模型生成图像进行医疗诊断的任务,并提出一种统计检验方法以量化其可靠性。该统计检验的基本思路是采用选择性推断框架,即在统计检验中考虑生成图像由训练好的扩散模型生成这一条件。通过所提出的方法,可以以p值的形式量化医学图像诊断结果的统计可靠性,从而实现在可控错误率下的决策。我们从理论上证明了该统计检验的有效性,并通过合成图像和脑图像数据集的数值实验展示了其效果。