Generative Reasoning Integrated Label Noise Robust Deep Image Representation Learning

The development of deep learning based image representation learning (IRL) methods has attracted great attention for various image understanding problems. Most of these methods require the availability of a high quantity and quality of annotated training images, which can be time-consuming and costly to gather. To reduce labeling costs, crowdsourced data, automatic labeling procedures or citizen science projects can be considered. However, such approaches increase the risk of including label noise in training data. It may result in overfitting on noisy labels when discriminative reasoning is employed. This leads to sub-optimal learning procedures, and thus inaccurate characterization of images. To address this, we introduce a generative reasoning integrated label noise robust deep representation learning (GRID) approach. Our approach aims to model the complementary characteristics of discriminative and generative reasoning for IRL under noisy labels. To this end, we first integrate generative reasoning into discriminative reasoning through a supervised variational autoencoder. This allows GRID to automatically detect training samples with noisy labels. Then, through our label noise robust hybrid representation learning strategy, GRID adjusts the whole learning procedure for IRL of these samples through generative reasoning and that of other samples through discriminative reasoning. Our approach learns discriminative image representations while preventing interference of noisy labels independently from the IRL method being selected. Thus, unlike the existing methods, GRID does not depend on the type of annotation, neural network architecture, loss function or learning task, and thus can be directly utilized for various problems. Experimental results show its effectiveness compared to state-of-the-art methods. The code of GRID is publicly available at https://github.com/gencersumbul/GRID.

翻译：基于深度学习的图像表示学习方法在各种图像理解问题中备受关注。然而，这些方法大多需要大量高质量标注的训练图像，收集这类数据耗时且成本高昂。为降低标注成本，可借助众包数据、自动标注流程或公民科学项目。但此类方法会增大训练数据中包含标签噪声的风险。当采用判别式推理时，模型可能对噪声标签过拟合，导致学习过程欠佳，进而造成图像特征表征不准确。为此，我们提出一种生成式推理集成且对标签噪声鲁棒的深度表示学习方法。该方法旨在建模判别式推理与生成式推理在噪声标签下的互补特性。首先，我们通过监督变分自编码器将生成式推理融入判别式推理，使模型能自动检测含噪声标签的训练样本。随后，基于标签噪声鲁棒的混合表示学习策略，对检测出的样本采用生成式推理调整整个学习过程，而对其他样本则保持判别式推理。该方法在独立于所选表示学习框架的前提下，学习判别式图像特征并防止噪声标签干扰。因此，与现有方法不同，该方法不依赖标注类型、神经网络架构、损失函数或学习任务，可直接应用于多种问题。实验结果表明，该方法相较现有最优方法具有有效性。代码已开源至 https://github.com/gencersumbul/GRID。