Label noise and class imbalance commonly coexist in real-world data. Previous works for robust learning, however, usually address either one type of the data biases and underperform when facing them both. To mitigate this gap, this work presents a novel meta-learning based dynamic loss that automatically adjusts the objective functions with the training process to robustly learn a classifier from long-tailed noisy data. Concretely, our dynamic loss comprises a label corrector and a margin generator, which respectively correct noisy labels and generate additive per-class classification margins by perceiving the underlying data distribution as well as the learning state of the classifier. Equipped with a new hierarchical sampling strategy that enriches a small amount of unbiased metadata with diverse and hard samples, the two components in the dynamic loss are optimized jointly through meta-learning and cultivate the classifier to well adapt to clean and balanced test data. Extensive experiments show our method achieves state-of-the-art accuracy on multiple real-world and synthetic datasets with various types of data biases, including CIFAR-10/100, Animal-10N, ImageNet-LT, and Webvision. Code will soon be publicly available.
翻译:真实世界的数据中,标签噪声与类别不平衡现象通常共存。然而,现有的鲁棒学习方法往往仅针对其中一种数据偏差,在处理两者并存场景时性能欠佳。为弥补这一不足,本文提出一种基于元学习的动态损失函数,能够随训练过程自适应调整目标函数,从而从长尾噪声数据中鲁棒地学习分类器。具体而言,该动态损失包含标签校正器与边际生成器两个模块:前者通过感知底层数据分布及分类器的学习状态来校正噪声标签,后者则生成可叠加的逐类分类边际。配合新颖的分层采样策略(利用多样化和困难样本丰富少量无偏元数据),动态损失中的两个组件通过元学习联合优化,促使分类器很好地适应干净且平衡的测试数据。大量实验表明,本方法在包含CIFAR-10/100、Animal-10N、ImageNet-LT和Webvision等各类数据偏差的多个真实世界及合成数据集上均达到最先进准确率。相关代码将很快公开。