Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels

Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we propose a novel method called Category-wise Fine-Tuning (CFT), aiming to reduce model inaccuracies caused by the wrong pseudo-labels. In particular, CFT employs known labels without pseudo-labels to fine-tune the logistic regressions of trained models individually to calibrate each category's model predictions. Genetic Algorithm, seldom used for training deep models, is also utilized in CFT to maximize the classification performance directly. CFT is applied to well-trained models, unlike most existing methods that train models from scratch. Hence, CFT is general and compatible with models trained with different methods and schemes, as demonstrated through extensive experiments. CFT requires only a few seconds for each category for calibration with consumer-grade GPUs. We achieve state-of-the-art results on three benchmarking datasets, including the CheXpert chest X-ray competition dataset (ensemble mAUC 93.33%, single model 91.82%), partially labeled MS-COCO (average mAP 83.69%), and Open Image V3 (mAP 85.31%), outperforming the previous bests by 0.28%, 2.21%, 2.50%, and 0.91%, respectively. The single model on CheXpert has been officially evaluated by the competition server, endorsing the correctness of the result. The outstanding results and generalizability indicate that CFT could be substantial and prevalent for classification model development. Code is available at: https://github.com/maxium0526/category-wise-fine-tuning.

翻译：大规模图像数据集通常仅具有部分标注，即每张图像仅有少数类别的标签已知。为未知标签分配伪标签以获取额外训练信号，已成为训练深度分类模型的常见做法。然而，部分伪标签不可避免地存在错误，导致模型分类性能显著下降。本文提出了一种名为"类别级微调"（Category-wise Fine-Tuning，CFT）的新方法，旨在减少由错误伪标签引起的模型误差。具体而言，CFT仅利用已知标签（不含伪标签）对已训练模型的逻辑回归层进行逐类别微调，以校正每个类别的模型预测。本文还采用了深度模型训练中少见的遗传算法，直接最大化分类性能。与大多数从零开始训练模型的现有方法不同，CFT应用于已充分训练的模型。因此，CFT具有通用性，可与不同方法及方案训练的模型兼容——大量实验已证实其有效性。在消费级GPU上，每个类别的校准仅需数秒。我们在三个基准数据集上取得了最先进的结果：CheXpert胸部X光竞赛数据集（集成模型mAUC 93.33%，单模型91.82%）、部分标注的MS-COCO（平均mAP 83.69%）以及Open Image V3（mAP 85.31%），分别超越此前最佳结果0.28%、2.21%、2.50%和0.91%。其中，CheXpert单模型结果已通过竞赛服务器官方评估，验证了其正确性。卓越的结果与泛化能力表明，CFT对分类模型开发具有重要价值及推广潜力。代码已开源：https://github.com/maxium0526/category-wise-fine-tuning。