The long-tailed recognition (LTR) is the task of learning high-performance classifiers given extremely imbalanced training samples between categories. Most of the existing works address the problem by either enhancing the features of tail classes or re-balancing the classifiers to reduce the inductive bias. In this paper, we try to look into the root cause of the LTR task, i.e., training samples for each class are greatly imbalanced, and propose a straightforward solution. We split the categories into three groups, i.e., many, medium and few, according to the number of training images. The three groups of categories are separately predicted to reduce the difficulty for classification. This idea naturally arises a new problem of how to assign a given sample to the right class groups? We introduce a mutual exclusive modulator which can estimate the probability of an image belonging to each group. Particularly, the modulator consists of a light-weight module and learned with a mutual exclusive objective. Hence, the output probabilities of the modulator encode the data volume clues of the training dataset. They are further utilized as prior information to guide the prediction of the classifier. We conduct extensive experiments on multiple datasets, e.g., ImageNet-LT, Place-LT and iNaturalist 2018 to evaluate the proposed approach. Our method achieves competitive performance compared to the state-of-the-art benchmarks.
翻译:长尾识别(LTR)是一项在类别间训练样本极度不平衡条件下学习高性能分类器的任务。现有研究工作主要从两个方向解决该问题:增强尾部类别的特征表示,或重新平衡分类器以降低归纳偏置。本文尝试探究LTR问题的根本原因——即各类别训练样本数量严重失衡——并提出一种直接解决方案。我们根据训练图像数量将所有类别划分为三组:大量组、中量组和少量组,通过分组预测降低分类难度。该思路自然引发一个新问题:如何将给定样本正确分配到所属类别组?为此,我们引入一种互斥调制器,该调制器能够估计图像属于每个组的概率。具体而言,该调制器由轻量级模块构成,并通过互斥目标函数进行学习。调制器的输出概率编码了训练数据集中的样本量分布线索,进一步作为先验信息指导分类器预测。我们在ImageNet-LT、Place-LT和iNaturalist 2018等多个数据集上开展大量实验评估所提方法。与最先进基准方法相比,我们的方法取得了具有竞争力的性能表现。