In this paper, we present a novel approach that combines deep metric learning and synthetic data generation using diffusion models for out-of-distribution (OOD) detection. One popular approach for OOD detection is outlier exposure, where models are trained using a mixture of in-distribution (ID) samples and ``seen" OOD samples. For the OOD samples, the model is trained to minimize the KL divergence between the output probability and the uniform distribution while correctly classifying the in-distribution (ID) data. In this paper, we propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs). Additionally, we explore recent advancements in metric learning to train our models. In the experiments, we found that metric learning-based loss functions perform better than the softmax. Furthermore, the baseline models (including softmax, and metric learning) show a significant improvement when trained with the generated OOD data. Our approach outperforms strong baselines in conventional OOD detection metrics.
翻译:本文提出一种结合深度度量学习与扩散模型合成数据的分布外检测新方法。在分布外检测中,异常样本暴露法是一种常用策略:通过混合分布内样本与"可见"分布外样本训练模型,其中对分布外样本,模型需最小化输出概率与均匀分布的KL散度,同时正确分类分布内数据。本文提出基于去噪扩散概率模型的标签混合方法生成合成分布外数据,并探索度量学习的最新进展进行模型训练。实验表明,基于度量学习的损失函数性能优于softmax。更重要的是,使用生成的分布外数据训练时,包括softmax和度量学习在内的基线模型均获得显著提升。本方法在常规分布外检测指标上超越强基线模型。