Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate labeling costs, existing weakly supervised learning methods struggle to adequately preserve supervised information for tail samples, resulting in a decline in accuracy for the tail classes. To alleviate this problem, we introduce a novel weakly supervised labeling setting called Reduced Label. The proposed labeling setting not only avoids the decline of supervised information for the tail samples, but also decreases the labeling costs associated with long-tailed data. Additionally, we propose an straightforward and highly efficient unbiased framework with strong theoretical guarantees to learn from these Reduced Labels. Extensive experiments conducted on benchmark datasets including ImageNet validate the effectiveness of our approach, surpassing the performance of state-of-the-art weakly supervised methods.
翻译:长尾数据在现实分类任务中普遍存在,且严重依赖监督信息,这使得标注过程异常耗费人力和时间。遗憾的是,尽管现有的弱监督学习方法常用于降低标注成本,但难以充分保留尾部样本的监督信息,导致尾部类别准确率下降。为缓解这一问题,我们提出了一种新颖的弱监督标注设置——简化标签(Reduced Label)。该标注设置不仅能避免尾部样本监督信息的缺失,还能降低长尾数据的标注成本。此外,我们提出了一个简洁高效、具有强理论保证的无偏框架,用于从这些简化标签中学习。在包括ImageNet在内的基准数据集上进行的广泛实验验证了我们方法的有效性,其性能超越了最先进的弱监督方法。