Long-tailed imbalance distribution is a common issue in practical computer vision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-weighting, transfer learning, and feature augmentation. In recent years, diffusion models have shown an impressive generation ability in many sub-problems of deep computer vision. However, its powerful generation has not been explored in long-tailed problems. We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue. First, we encode the imbalanced dataset into features using the baseline model. Then, we train a Denoising Diffusion Implicit Model (DDIM) using these encoded features to generate pseudo-features. Finally, we train the classifier using the encoded and pseudo-features from the previous two steps. The model's accuracy shows an improvement on the CIFAR-LT and ImageNet-LT datasets by using the proposed method.
翻译:长尾不平衡分布是实际计算机视觉应用中的常见问题。以往研究提出了多种方法来解决该问题,可分为重采样、重加权、迁移学习和特征增强等类别。近年来,扩散模型在深度计算机视觉的诸多子问题中展现了卓越的生成能力。然而,其强大的生成能力尚未在长尾问题中得到探索。我们提出了一种新方法——面向长尾识别的基于潜在扩散模型(LDMLR),作为特征增强方法来应对该问题。首先,我们利用基线模型将不平衡数据集编码为特征。其次,我们使用这些编码特征训练一种去噪扩散隐式模型(DDIM)以生成伪特征。最后,我们利用前两步中获得的编码特征和伪特征训练分类器。通过采用所提出的方法,模型在CIFAR-LT和ImageNet-LT数据集上的准确率得到了提升。