The key to multi-label image classification (MLC) is to improve model performance by leveraging label correlations. Unfortunately, it has been shown that overemphasizing co-occurrence relationships can cause the overfitting issue of the model, ultimately leading to performance degradation. In this paper, we provide a causal inference framework to show that the correlative features caused by the target object and its co-occurring objects can be regarded as a mediator, which has both positive and negative impacts on model predictions. On the positive side, the mediator enhances the recognition performance of the model by capturing co-occurrence relationships; on the negative side, it has the harmful causal effect that causes the model to make an incorrect prediction for the target object, even when only co-occurring objects are present in an image. To address this problem, we propose a counterfactual reasoning method to measure the total direct effect, achieved by enhancing the direct effect caused only by the target object. Due to the unknown location of the target object, we propose patching-based training and inference to accomplish this goal, which divides an image into multiple patches and identifies the pivot patch that contains the target object. Experimental results on multiple benchmark datasets with diverse configurations validate that the proposed method can achieve state-of-the-art performance.
翻译:多标签图像分类的关键在于利用标签相关性提升模型性能。然而研究表明,过度强调共现关系会导致模型过拟合问题,最终引发性能下降。本文提出因果推断框架,将目标对象及其共现对象产生的相关特征视为中介变量,该变量对模型预测具有正负双重影响。积极方面,中介变量通过捕捉共现关系增强模型识别性能;消极方面,该变量会产生有害的因果效应——即使图像中仅存在共现对象时,也会导致模型对目标对象作出错误预测。针对该问题,我们提出反事实推理方法,通过增强仅由目标对象产生的直接效应来测量总直接效应。鉴于目标对象位置未知,我们提出基于补丁的训练与推理方案:将图像分割为多个补丁,并定位包含目标对象的枢轴补丁。在多个基准数据集和不同配置下的实验结果表明,所提方法能够实现最先进的性能。