Recently, long-tailed image classification harvests lots of research attention, since the data distribution is long-tailed in many real-world situations. Piles of algorithms are devised to address the data imbalance problem by biasing the training process towards less frequent classes. However, they usually evaluate the performance on a balanced testing set or multiple independent testing sets having distinct distributions with the training data. Considering the testing data may have arbitrary distributions, existing evaluation strategies are unable to reflect the actual classification performance objectively. We set up novel evaluation benchmarks based on a series of testing sets with evolving distributions. A corpus of metrics are designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution. Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets, which is valuable for guiding the selection of data rebalancing techniques. We also revisit existing methods and categorize them into four types including data balancing, feature balancing, loss balancing, and prediction balancing, according the focused procedure during the training pipeline.
翻译:近年来,长尾图像分类受到大量研究关注,因为许多现实场景中的数据分布呈现长尾特性。大量算法通过偏向训练过程中出现频率较低的类别来解决数据不平衡问题。然而,这些算法通常使用平衡测试集或多个与训练数据分布不同的独立测试集来评估性能。考虑到测试数据可能具有任意分布,现有评估策略无法客观反映实际分类性能。我们基于一系列分布演变的测试集建立了新型评估基准,并设计了一套衡量算法在长尾分布学习中的准确率、鲁棒性和上界的指标体系。基于我们的基准,在CIFAR10和CIFAR100数据集上重新评估了现有方法的性能,这为选择数据重平衡技术提供了重要指导。同时,我们重新审视了现有方法,并根据训练流程中的核心环节将其分为四类:数据平衡、特征平衡、损失平衡和预测平衡。