Both long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning. Most prior works treat either problem in an isolated way and do not explicitly consider the coupling effects of the two. Our empirical observation reveals that such solutions fail to consistently improve the learning when the dataset is long-tailed with label noise. Moreover, with the presence of label noise, existing methods do not observe universal improvements across different sub-populations; in other words, some sub-populations enjoyed the benefits of improved accuracy at the cost of hurting others. Based on these observations, we introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations. We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance. Extensive experiments demonstrate the effectiveness of the proposed solution when complemented with certain existing popular robust or class-balanced methods.
翻译:长尾分布与噪声标记的数据在现实应用中频繁出现,并对学习构成重大挑战。以往大多数研究孤立地处理其中一个问题,未明确考虑两者之间的耦合效应。我们的实证观察表明,当数据集呈现长尾分布且存在标签噪声时,这些解决方案无法持续改善学习效果。此外,在标签噪声存在的情况下,现有方法无法在不同子群体间实现普遍的性能提升;换言之,某些子群体以损害其他群体为代价获得了准确率提升的收益。基于这些观察,我们提出公平性正则化器(Fairness Regularizer,FR),其灵感源于正则化任意两个子群体之间的性能差距。研究表明,该公平性正则化器可提升尾部子群体的性能及整体学习效果。大量实验证明,当与某些现有流行的鲁棒方法或类别平衡方法结合使用时,所提方案具有有效性。