Out-of-distribution detection is crucial to the safe deployment of machine learning systems. Currently, unsupervised out-of-distribution detection is dominated by generative-based approaches that make use of estimates of the likelihood or other measurements from a generative model. Reconstruction-based methods offer an alternative approach, in which a measure of reconstruction error is used to determine if a sample is out-of-distribution. However, reconstruction-based approaches are less favoured, as they require careful tuning of the model's information bottleneck - such as the size of the latent dimension - to produce good results. In this work, we exploit the view of denoising diffusion probabilistic models (DDPM) as denoising autoencoders where the bottleneck is controlled externally, by means of the amount of noise applied. We propose to use DDPMs to reconstruct an input that has been noised to a range of noise levels, and use the resulting multi-dimensional reconstruction error to classify out-of-distribution inputs. We validate our approach both on standard computer-vision datasets and on higher dimension medical datasets. Our approach outperforms not only reconstruction-based methods, but also state-of-the-art generative-based approaches. Code is available at https://github.com/marksgraham/ddpm-ood.
翻译:分布外检测对于机器学习系统的安全部署至关重要。目前,无监督分布外检测主要采用基于生成的方法,利用生成模型中的似然估计或其他测量指标。基于重建的方法提供了一种替代途径,通过重建误差的度量来判断样本是否属于分布外。然而,重建方法因需要精心调整模型的信息瓶颈(如潜在维度大小)才能获得良好效果而较少被采用。在本研究中,我们利用去噪扩散概率模型(DDPM)作为去噪自编码器的特性,其瓶颈通过施加的噪声量从外部控制。我们提出使用DDPM对不同噪声水平下的输入进行重建,并利用由此产生的多维重建误差来分类分布外输入。我们在标准计算机视觉数据集和更高维度的医学数据集上验证了该方法。我们的方法不仅优于基于重建的方法,而且超越了当前最先进的基于生成的方法。代码可在https://github.com/marksgraham/ddpm-ood获取。