Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels

Conventional multi-label classification (MLC) methods assume that all samples are fully labeled and identically distributed. Unfortunately, this assumption is unrealistic in large-scale MLC data that has long-tailed (LT) distribution and partial labels (PL). To address the problem, we introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC), to jointly consider the above two imperfect learning environments. Not surprisingly, we find that most LT-MLC and PL-MLC approaches fail to solve the PLT-MLC, resulting in significant performance degradation on the two proposed PLT-MLC benchmarks. Therefore, we propose an end-to-end learning framework: \textbf{CO}rrection $\rightarrow$ \textbf{M}odificat\textbf{I}on $\rightarrow$ balan\textbf{C}e, abbreviated as \textbf{\method{}}. Our bootstrapping philosophy is to simultaneously correct the missing labels (Correction) with convinced prediction confidence over a class-aware threshold and to learn from these recall labels during training. We next propose a novel multi-focal modifier loss that simultaneously addresses head-tail imbalance and positive-negative imbalance to adaptively modify the attention to different samples (Modification) under the LT class distribution. In addition, we develop a balanced training strategy by distilling the model's learning effect from head and tail samples, and thus design a balanced classifier (Balance) conditioned on the head and tail learning effect to maintain stable performance for all samples. Our experimental study shows that the proposed \method{} significantly outperforms general MLC, LT-MLC and PL-MLC methods in terms of effectiveness and robustness on our newly created PLT-MLC datasets.

翻译：传统多标签分类方法假设所有样本均被完全标注且独立同分布。然而，这一假设在大规模多标签数据中并不现实，因为此类数据往往呈现长尾分布并存在部分标签问题。为解决该问题，我们提出了一项新任务——部分标注与长尾多标签分类，旨在联合考虑上述两种不完美学习环境。不出所料，我们发现大多数长尾多标签分类和部分标签多标签分类方法在解决该任务时会失效，导致在两个新提出的基准数据集上性能显著下降。为此，我们提出一个端到端学习框架：\textbf{修正}→\textbf{调整}→\textbf{平衡}，简称\textbf{\method{}}。我们的自举思想是：在训练过程中，利用类别感知阈值上确信的预测置信度同步修正缺失标签，并从这些召回标签中学习。接着，我们提出一种新颖的多焦点修正损失函数，同时应对头部-尾部不平衡和正负样本不平衡问题，以自适应调整长尾类别分布下不同样本的关注度。此外，我们通过蒸馏模型在头部与尾部样本上的学习效果，开发了一种平衡训练策略，并基于头部与尾部学习效果设计了平衡分类器，以维持所有样本的稳定性能。实验表明，在新创建的数据集上，所提方法在有效性和鲁棒性上显著优于通用多标签分类、长尾多标签分类及部分标签多标签分类方法。